Speeding up nonnegative low-rank approximation: Parallelism and Randomization

Author(s)
Hayashi, Koby Bruce
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computational Science and Engineering
School established in May 2010
Supplementary to:
Abstract
Nonnegative Matrix Factorization (NMF) is a popular technique used in Machine Learning, Data mining, Computational Neuroscience, Image Segmentation, and more. As problem sizes grow larger the need for faster and more scalable algorithms becomes increasingly important. The topic of this thesis is to design, theoretically analyze, and implement fast methods for computing approximate NMF's and its variants. We approach this problem from two angles: scaling down and scaling out. When scaling down the amount of data or resources being used is reduced and when scaling out one incorporates distributed memory nodes via cluster or cloud computing. We scale down by applying techniques from Randomized Numerical Linear Algebra (RandNLA) to design efficient, randomized algorithms for NMF which allow us to down sample data in a principled way. These randomized methods achieve up to 7× speed up over nonrandomized methods and preserve solution quality in terms of relative error and downstream graph clustering tasks. We scale out by designing communication efficient distributed memory algorithms for execution on state-of-the-art super computing systems. Our distributed memory algorithms scale up to 16000 cores and achieve 2.2× speed up over existing parallel codes. We evaluate our methods on synthetic and real world data sets in terms of run time and various quality of solution metrics.
Sponsor
Date
2024-07-26
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI