Speeding up nonnegative low-rank approximation: Parallelism and Randomization
Author(s)
Hayashi, Koby Bruce
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Nonnegative Matrix Factorization (NMF) is a popular technique used in Machine Learning, Data mining, Computational Neuroscience, Image Segmentation, and more. As problem sizes grow larger the need for faster and more scalable algorithms becomes increasingly important. The topic of this thesis is to design, theoretically analyze, and implement fast methods for computing approximate NMF's and its variants. We approach this problem from two angles: scaling down and scaling out. When scaling down the amount of data or resources being used is reduced and when scaling out one incorporates distributed memory nodes via cluster or cloud computing. We scale down by applying techniques from Randomized Numerical Linear Algebra (RandNLA) to design efficient, randomized algorithms for NMF which allow us to down sample data in a principled way. These randomized methods achieve up to 7× speed up over nonrandomized methods and preserve solution quality in terms of relative error and downstream graph clustering tasks. We scale out by designing communication efficient distributed memory algorithms for execution on state-of-the-art super computing systems. Our distributed memory algorithms scale up to 16000 cores and achieve 2.2× speed up over existing parallel codes. We evaluate our methods on synthetic and real world data sets in terms of run time and various quality of solution metrics.
Sponsor
Date
2024-07-26
Extent
Resource Type
Text
Resource Subtype
Dissertation