Statistical Theory for Neural Network-Based Learning

Author(s)
Ko, Hyunouk Andy
Advisor(s)
Editor(s)
Associated Organization(s)
Supplementary to:
Abstract
In Chapter 1, we introduce the general problem of binary classification and the significance of studying statistical properties of neural network-based classifiers. In addition, we include a high-level overview of the main results in this thesis along with a brief review of relevant literature. In Chapter 2, we provide the necessary technical preparations for the statement and proofs of the results in the rest of the thesis. Specifically, we define the classification problem and the space of neural networks, give a short introduction to concepts from rate distortion theory used in Chapter 4, and define the Barron approximation space used in Chapter 5. Furthermore, we conclude with a discussion of the relationship between regression and classification and related results from the literature. In Chapter 3, we show that random classifiers based on finitely wide and deep neural networks are consistent for a very general class of distributions. Consistency is a highly desirable property for a sequence of classifiers that guarantees that the classification risk converges to the smallest possible risk. This result improves the classical result of Farago and Lugosi (1993) by extending the consistency property for shallow, underparametrized neural networks with sigmoid activations to wide and deep ReLU neural networks without complexity constraints. In Chapter 4, we give several convergence rate guarantees of the excess classification risk for a semiparametric model of distributions indexed by Borel probability measures on [0, 1]d and regression functions belonging to L2 class of functions with finite Kolmogorov-Donoho optimal exponents. Furthermore, we give explicit characterizations of distributional regimes in which neural network classifiers are minimax optimal. In Chapter 5, we show that for a semiparametric model of distributions defined by regular marginal distributions and regression functions that locally belong to the Barron approximation space, neural network classifiers achieve a rate of $n^{(1+\alpha)/(3(2+\alpha))}$. We also show that this rate is minimax optimal up to a logarithmic factor.
Sponsor
Date
2024-12-02
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI