Title:
Performance optimizations for quantum chemistry calculations

Thumbnail Image
Author(s)
Huang, Hua
Authors
Advisor(s)
Chow, Edmond
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Series
Supplementary to
Abstract
Quantum chemistry is a mature area of computational science with many methods and codes developed that are used across chemistry, biochemistry, and materials science. Optimizing computational kernels in quantum chemistry calculations is usually challenging due to the high complexity of the algorithms and also the high complexity of modern computer hardware. This thesis focuses on optimizing the performance of three important computational kernels in quantum chemistry calculations. We first optimize electron repulsion integral (ERI) calculations for Gaussian basis sets. A batching scheme for ERI calculations is designed that better utilizes vector processing units in a processor to calculate multiple ERIs simutaneously. With the optimized ERI calculations, the tensor contraction in Fock matrix construction can become the performance bottleneck. We design a thread-safe algorithm along with specific optimizations to improve the performance of shared-memory Fock matrix construction. For distributed-memory Fock matrix construction, we design a new portable partitioned global address space (PGAS) framework called GTMatrix. GTMatrix has better communication performance compared to the Global Arrays library which is a commonly used PGAS framework in quantum chemistry programs. Finally, we optimize density matrix purification, which is a method of constructing the density matrix directly from the Fock matrix. We present the new idea of "overlapping communications with communications" to accelerate matrix-matrix multiplications in density matrix purification. We implement the optimizations in the GTFock library. GTFock is a high-performance Fock matrix construction library with a Hartree-Fock self-consistent field (SCF) demo program. Test results show that optimized GTFock is up to three times faster when performing an SCF iteration compared to the unoptimized version.
Sponsor
Date Issued
2019-04-22
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI