Statistical Learning Theory of Deep Neural Networks: A Generalization Viewpoint

Author(s)
Zhou, Tian-Yi
Advisor(s)
Editor(s)
Associated Organization(s)
Supplementary to:
Abstract
This thesis investigates the mathematical foundations of deep learning, focusing on the statistical guarantees of deep neural networks in regression, classification, and anomaly detection. Specifically, it seeks to understand when and how neural networks effectively generalize to unseen data in these tasks. By uncovering the statistical and computational mechanisms that drive the success of deep learning, this thesis aims to further improve the robustness and accuracy of systems that rely on these technologies. Chapter 1 of the thesis studies the phenomenon of benign overfitting among convolutional neural networks (CNN), an important class of neural networks designed to efficiently learn spatial hierarchies of features. It demonstrates that the generalization rate of a CNN architecture remains unchanged, even with a substantial increase in model and parameter sizes. Chapter 2 of the thesis studies the classification of unbounded data generated from Gaussian Mixture Models using fully-connected neural network. For the first time, we obtain non-asymptotic upper bounds and convergence rates of the excess risk without restrictions on model parameters. Chapter 3 develops a mathematical framework and theory-grounded tools for unsupervised anomaly detection, with a focus on its practice in cybersecurity. It establishes the first optimality result for anomaly detection, and quantifies the amount of synthetic anomalies needed to achieve high accuracy. Finally, chapter 4 explores the use of deep learning in functional data analysis, focusing on the approximation of nonlinear functionals mapping from a reproducing kernel Hilbert space to \RR.
Sponsor
Date
2025-04-24
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI