Statistical Theory for Neural Network-Based Learning
Author(s)
Ko, Hyunouk Andy
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
In Chapter 1, we introduce the general problem of binary classification and the
significance of studying statistical properties of neural network-based classifiers. In
addition, we include a high-level overview of the main results in this thesis along with
a brief review of relevant literature.
In Chapter 2, we provide the necessary technical preparations for the statement
and proofs of the results in the rest of the thesis. Specifically, we define the classification
problem and the space of neural networks, give a short introduction to
concepts from rate distortion theory used in Chapter 4, and define the Barron approximation
space used in Chapter 5. Furthermore, we conclude with a discussion
of the relationship between regression and classification and related results from the
literature.
In Chapter 3, we show that random classifiers based on finitely wide and deep
neural networks are consistent for a very general class of distributions. Consistency
is a highly desirable property for a sequence of classifiers that guarantees that the
classification risk converges to the smallest possible risk. This result improves the
classical result of Farago and Lugosi (1993) by extending the consistency property
for shallow, underparametrized neural networks with sigmoid activations to wide and
deep ReLU neural networks without complexity constraints.
In Chapter 4, we give several convergence rate guarantees of the excess classification
risk for a semiparametric model of distributions indexed by Borel probability
measures on [0, 1]d and regression functions belonging to L2 class of functions with
finite Kolmogorov-Donoho optimal exponents. Furthermore, we give explicit characterizations
of distributional regimes in which neural network classifiers are minimax
optimal.
In Chapter 5, we show that for a semiparametric model of distributions defined
by regular marginal distributions and regression functions that locally belong to the
Barron approximation space, neural network classifiers achieve a rate of $n^{(1+\alpha)/(3(2+\alpha))}$.
We also show that this rate is minimax optimal up to a logarithmic factor.
Sponsor
Date
2024-12-02
Extent
Resource Type
Text
Resource Subtype
Dissertation