Learning and inference for distributions: from optimal transport to Markov chain Monte Carlo

Author(s)
Fan, Jiaojiao
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Daniel Guggenheim School of Aerospace Engineering
The Daniel Guggenheim School of Aeronautics was established in 1931, with a name change in 1962 to the School of Aerospace Engineering
Supplementary to:
Abstract
Distributional data refers to data that provides insights into the probability distribution of a random variable or a set of random variables. In the machine learning domain, distributions are typically represented by a large number of samples. They have numerous crucial applications across various fields, including computer vision, healthcare and biomedical engineering. There is a famous quote: "Data is the food for machine learning." Understanding how to transform and manipulate distributions in high-dimensional and data-driven formats is therefore essential. This dissertation focuses on large-scale distributional problems, characterized either by the sheer volume of data or by high dimensionality. We investigate two fundamental mathematical tools: optimal transport and MCMC sampling, both of which are integral to the transformation and manipulation of distributional data. Optimal transport (OT) is a centuries-old mathematical framework for comparing probability distributions. One of its most significant concepts, the Wasserstein distance, has had a substantial impact on machine learning, particularly in the study of Generative Adversarial Networks. Despite its important applications, such as calculating distribution discrepancies, interpolating, and aligning distributions, optimal transport is often hindered by its computational cost. In this thesis, we improve the computational efficiency of optimal transport from two perspectives. First, for discrete OT, where probability distributions are represented by probability vectors, we enhance the multi-marginal optimal transport associated with graph structures. Second, we scale up OT to handle millions of samples by introducing neural OT solvers. Additionally, we demonstrate multiple downstream applications for these neural OT solvers, including Wasserstein gradient flow, Wasserstein barycenter and generalized geodesic. MCMC sampling remains the primary technique for sampling from a distribution and has numerous applications in Bayesian statistics, computational physics, and computational biology. However, it faces challenges in terms of computational scalability, particularly with increasing dimensions. To overcome these limitations, we introduce a novel algorithm based on the proximal sampler, and rigorously prove its computational complexity for converging to the target distribution. Additionally, we prove a Gaussian concentration inequality for semi-smooth functions, which could be of independent interest. This result recovers the order of the well-known Gaussian concentration inequality for Lipschitz functions.
Sponsor
Date
2024-07-12
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI