Title:
Scalable and resilient sparse linear solvers

dc.contributor.advisor Vuduc, Richard
dc.contributor.author Sao, Piyush kumar
dc.contributor.committeeMember Li, Xiaoye S.
dc.contributor.committeeMember Park, Haesun
dc.contributor.committeeMember Chow, Edmond
dc.contributor.committeeMember Zhou, Hao-Min
dc.contributor.committeeMember Catalayurek, Umit
dc.contributor.department Computational Science and Engineering
dc.date.accessioned 2018-08-20T15:35:41Z
dc.date.available 2018-08-20T15:35:41Z
dc.date.created 2018-08
dc.date.issued 2018-05-22
dc.date.submitted August 2018
dc.date.updated 2018-08-20T15:35:41Z
dc.description.abstract Solving a large and sparse system of linear equations is a ubiquitous problem in scientific computing. The challenges in scaling such solvers on current and future parallel computer systems are the high-cost of communication and the expected decrease in reliability of the hardware components. This dissertation contributes new techniques to address these issues. Regarding communication, we make two advances to reduce both on-node and inter-node communication of distributed memory sparse direct solvers. On-node, we propose a novel technique, called the HALO, targeted at heterogeneous architectures consisting of multicore and hardware accelerator such as GPU or Xeon-Phi. The name HALO is a shorthand for highly asynchronous lazy offload, which refers to the way the method combines highly aggressive use of asynchrony with the accelerated offload, lazy updates, and data shadowing (a la Halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. The overall hybrid solver achieves speed-up of up-to 3x on a variety of realistic test problems in single and multi-node configurations. To reduce inter-node communication, we present a novel communication-avoiding 3D sparse LU factorization algorithm. The 3D sparse LU factorization algorithm uses a three-dimensional logical arrangement of MPI processes and combines the data redundancy with the so-called elimination tree parallelism to reduce the communication. The 3D algorithm reduces the asymptotic communication costs by a factor of $O(\sqrt(log n))$ and latency costs by a factor of $O(log n)$ for planar sparse matrices arising from finite element discretization of two-dimensional PDEs. For the non-planar sparse matrices, it reduces the communication and latency costs by a constant factor. Beyond performance, we consider methods to improve solver resilience. In emerging and future systems with billions of computing elements, hardware faults during the execution may become a norm rather than an exception. We illustrate the principle of self-stabilization for constructing fault-tolerant iterative linear solvers. We give two proof-of-concept examples of self-stabilizing iterative linear solvers: one for steepest descent (SD) and one for conjugate gradients (CG). Our self-stabilized versions of SD and CG require small amounts of fault-detection, e.g., we may check only for NaNs and infinities. We test our approach experimentally by analyzing its convergence and overhead for different types and rates of faults.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/60233
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Sparse linear solver
dc.subject Distributed computing
dc.subject Communication avoiding algorithm
dc.subject Numerical linear algebra
dc.subject Fault-tolerance
dc.title Scalable and resilient sparse linear solvers
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Vuduc, Richard
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computational Science and Engineering
relation.isAdvisorOfPublication e9a36794-e148-4304-8933-6ae0449c21d2
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 01ab2ef1-c6da-49c9-be98-fbd1d840d2b1
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
SAO-DISSERTATION-2018.pdf
Size:
5.2 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: