Structured Statistical Estimation via Optimization

Author(s)
Mcrae, Andrew Duncan
Editor(s)
Associated Organization(s)
Supplementary to:
Abstract
This thesis shows how we can exploit low-dimensional structure in high-dimensional statistics and machine learning problems via optimization. We show several settings where, with an appropriate choice of optimization algorithm, we can perform useful estimation with a complexity that scales not with the original problem dimension but with a much smaller intrinsic dimension. In the low-rank matrix completion and denoising problems, we can exploit low-rank structure to recover a large matrix from noisy observations of some or all of its entries. We prove state-of-the-art results for this problem in the case of Poisson noise and show that these results are minimax-optimal. Next, we study the problem of recovering a sparse vector from nonlinear measurements. We present a lifted matrix framework for the sparse phase retrieval and sparse PCA problems that includes a novel atomic norm regularizer. We prove that solving certain convex optimization problems in this framework yields estimators with near-optimal performance. Although we do not know how to compute these estimators efficiently and exactly, we derive a principled heuristic algorithm for sparse phase retrieval that matches existing state-of-the-art algorithms. Third, we show how we can exploit low-dimensional manifold structure in supervised learning. In a reproducing kernel Hilbert space framework, we show that smooth functions on a manifold can be estimated with a complexity scaling with the manifold dimension rather than a larger embedding space dimension. Finally, we study the interaction between high ambient dimension and a lower intrinsic dimension in the harmless interpolation phenomenon (where learned functions generalize well despite interpolating noisy data). We present a general framework for this phenomenon in linear and reproducing kernel Hilbert space settings, proving that it occurs in many situations that previous work has not covered.
Sponsor
Date
2022-04-28
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI