15th European Conference on Turbomachinery Fluid dynamics & Thermodynamics
Authors
Abstract
Over the last decade an effort between MTU Areo Engines, DLR Cologne and the Chair for Scientific Computing at the TU Kaiserslautern has been made to create an adjoint solver for DLR’s internal flow solver TRACE. This paper summarizes the strategies and techniques adopted to achieve a memory and CPU efficient adjoint code. The derivatives required for setting up the adjoint are computed with operator overloading algorithmic differentiation (AD). This technique replaces the basic floating point data type of the primal code with a so-called active type that is used to track the computational graph of TRACE which then computes the derivatives. During the development, it turned out that existing AD tools, although being powerful, lacked certain features that the authors consider crucial for the treatment of rather complex simulation codes. The Chair for Scientific Computing developed several new tools for the differentiation of TRACE. The development of the AD tool CoDiPack had previously shown promising results for other codes and was adapted to TRACE, which improved the memory requirements and run time of the adjoint solver. CoDiPack also provided the means to properly introduce the vector mode of AD which enables the adjoint solver to compute the solution for multiple target functionals or constraints with only a minimal computational overhead. The AD handling of MPI communication from the primal code posed a great challenge due to the use of custom buffers. The development of MeDiPack addressed this issue by providing a coverage for the AD handling of about 80% of the MPI standard, including the handling of custom MPI buffers. As a next step, the handling of shared memory parallelization through threading is also made available for the adjoint solver. For this purpose, a so-called task graph is generated in TRACE not only for the primal but also for the adjoint solver. This task graph stores information about the dependencies between different computational steps. CoDiPack was extended to achieve thread safety, so that the computational graph of the adjoint task graph can be accessed by multiple threads. In order to further speed up the adjoint computation, tools were developed to analyze the performance bottlenecks of the adjoint computation which was then enhanced with preaccumulation and function checkpointing techniques. Furthermore, we outline our approach to treat the sparse matrix library that is used for the implicit solver of the primal code. Here, an AD wrapper was developed for the library to provide memory and CPU efficient adjoint versions of the linear algebra operations. Moreover, we summarize techniques applied to the adjoint solver to enhance its robustness at off-design operating points, e.g. near the stall line. During the development, tools for the maintenance of the adjoint solver were developed which now allow an automatic validation of the correctness of the adjoint. The paper illustrates the above techniques by means of computational results for turbomachinery configurations along with run time and memory measurements.
ETC2023-346