Robust Learning in High Dimension
Author(s)
Jia, He
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
This thesis focuses on designing efficient and robust algorithms for learning high-dimensional distributions, particularly in the presence of adversarial corruptions. We addresse existing challenges in robust learning in high dimension and establish new connections between robust statistics and algorithmic solutions. In particular, we give polynomial-time algorithms for the following two open problems: robust learning of Gaussian Mixture Models (GMMs) and robust learning of affine transformations of hypercubes.
For GMMs, we give a polynomial-time algorithm for robustly learning a mixture of $k$ arbitrary Gaussians in $\R^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of two Gaussians. Our main tools are an efficient partial clustering algorithm that relies on the sum-of-squares method (building on previous work), and a novel tensor decomposition algorithm that allows errors in both Frobenius norm and low-rank terms.
The second contribution is a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Our algorithm, based on a new method that goes beyond the limitations of the method of moments, achieves asymptotically optimal error in total variation distance.
Lastly, we introduce a faster algorithm for the isotropic transformation of convex bodies, whose complexity is directly tied to the KLS (Kannan-Lovász-Simonovits) constant. This serves as a useful tool to high-dimensional learning and volume computation.
Sponsor
Date
2024-04-30
Extent
Resource Type
Text
Resource Subtype
Dissertation