Structured Statistical Estimation via Optimization
Author(s)
Mcrae, Andrew Duncan
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
This thesis shows how we can exploit low-dimensional structure in high-dimensional statistics and machine learning problems
via optimization.
We show several settings where, with an appropriate choice of optimization algorithm, we can perform useful estimation with a complexity that scales not with the original problem dimension but with a much smaller intrinsic dimension.
In the low-rank matrix completion and denoising problems,
we can exploit low-rank structure to recover a large matrix from noisy observations of some or all of its entries.
We prove state-of-the-art results for this problem in the case of Poisson noise and show that these results are minimax-optimal.
Next, we study the problem of recovering a sparse vector from nonlinear measurements.
We present a lifted matrix framework for the sparse phase retrieval and sparse PCA problems that includes a novel atomic norm regularizer.
We prove that solving certain convex optimization problems in this framework yields estimators with near-optimal performance.
Although we do not know how to compute these estimators efficiently and exactly, we derive a principled heuristic algorithm for sparse phase retrieval that matches existing state-of-the-art algorithms.
Third, we show how we can exploit low-dimensional manifold structure in supervised learning.
In a reproducing kernel Hilbert space framework,
we show that smooth functions on a manifold can be estimated with a complexity scaling with the manifold dimension rather than a larger embedding space dimension.
Finally, we study the interaction between high ambient dimension and a lower intrinsic dimension in the harmless interpolation phenomenon (where learned functions generalize well despite interpolating noisy data).
We present a general framework for this phenomenon in linear and reproducing kernel Hilbert space settings, proving that it occurs in many situations that previous work has not covered.
Sponsor
Date
2022-04-28
Extent
Resource Type
Text
Resource Subtype
Dissertation