Directly to content
  1. Publishing |
  2. Search |
  3. Browse |
  4. Recent items rss |
  5. Open Access |
  6. Jur. Issues |
  7. DeutschClear Cookie - decide language by browser settings

Superlinear Parallel Scaling of Quadrature Evaluated Second-Order Møller-Plesset Perturbation Theory

Thomitzni, Benjamin

[thumbnail of thomitzni_pdfa.pdf]
Preview
PDF, English - main document
Download (6MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

Abstract

Most resources of modern computer clusters are locked behind the need for a high degree of parallel efficiency, having millions of processing units on heterogeneous hardware, e.g. having both CPUs and GPUs. Post-HF wave function-based methods offer a high accuracy while having high computational scaling and memory requirements. Their need for storing, transforming, manipulating, and communicating their underlying high-dimensional quantities makes them a bad fit for modern computer architecture.

The second-order Møller-Plesset perturbation theory (MP2) is a commonly used method to recover the electron correlation energy that the Hartree-Fock method is missing. Its computational scaling is O(N^5) (N representing the number of atomic orbitals), as its two-electron integrals have to be transformed from their atomic orbital representation to their molecular orbital representation, and its memory requirements scale with O(N^4).

This thesis applies the ideas of a quadrature scheme to the MP2 method to arrive at a lower scaling form that is embarrassingly parallel. The full Q-MP2 energy integral scales with O(P^2OV) (spatial grid points P, occupied molecular orbitals O, virtual molecular orbitals V) with its largest entity needing O(PN^2) memory. In this thesis, an efficient implementation in the form of the open-source libqqc library is presented. The setup of the integration grids is the deciding factor of the accuracy of the Q-MP2 method. To investigate this a benchmark of small molecules is shown, consisting of seven molecules, three basis sets, and 28 different grid combinations. All but the smallest grid combinations are shown to be within the magnitude of the target chemical accuracy, with only four points on the one-dimensional integration grid being necessary. The error of the spatial grid falls asymptotically with the number of grid points, with the third smallest grid of 20 radial and 38 angular points being chosen for further tests as it is the smallest well-behaved one. The total single-node performance, measured as the number of floating point operations performed per second compared to the theoretical maximum, of the different variants of the algorithm is found to be below 15%. The algorithm is memory bound. The parallelisation strategy shows near-perfect load balancing over the computational nodes. The single-node parallel efficiency is shown to be superlinear for large systems, as a higher percentage of memory can be stored in low-level and fast memory caches with an increasing number of cores. This trend is followed at the multi-node level, which was investigated for up to 960 cores/20 nodes on the JUSTUS 2 computer cluster.

Future optimisation strategies will be focused on optimising the integration grids to lower the number of necessary integration grid points, integral screening, better utilisation of temporal locality, and exploitation of matrix sparsity. Finally, the quadrature scheme was extended to coupled-cluster theory (Q-CC2) and the algebraic-diagrammatic construction scheme for the polarisation propagator (Q-ADC(2)). For the latter method, the computational scaling associated with the solution of the particle-hole state was lowered from originally O(N^5) to O(P^2OV^2 ) and the memory requirement can be additionally lowered from O(N^4) to O(PN^2) by folding of the doubles space into the singles space. Compared to the performance of Q-MP2, a future implementation is expected to have a better single-node performance as more computational work needs to be done per memory transaction, and to have similar parallel efficiency as little additional node-to-node communication is necessary.

Document type: Dissertation
Supervisor: Dreuw, Prof. Dr. Andreas
Place of Publication: Heidelberg
Date of thesis defense: 28 February 2023
Date Deposited: 10 Mar 2023 09:23
Date: 2023
Faculties / Institutes: Fakultät für Chemie und Geowissenschaften > Institute of Physical Chemistry
Service facilities > Interdisciplinary Center for Scientific Computing
DDC-classification: 500 Natural sciences and mathematics
530 Physics
540 Chemistry and allied sciences
Controlled Keywords: quantum chemistry, electronic structure theory, high-performance computing
About | FAQ | Contact | Imprint |
OA-LogoDINI certificate 2013Logo der Open-Archives-Initiative