Large-Scale Optimization for Deep Neural Network Architecture: A Dynamical System Theory Perspective
Author(s)
Liu, Guan-Horng
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Optimization of deep neural networks (DNNs) has been a driving force in the advancement of modern machine learning and artificial intelligence. Despite efforts to design DNN architectures that leverage domain-specific knowledge, the development of optimization algorithms for training these million-parameter functions has often progressed independently of architectural innovations. This thesis delves into large-scale optimization methods that are not only aware of but also leverage the underlying deep architectural structures being optimized. Specifically, we demonstrate that the dynamical system and optimal control theory pave a profound foundation for algorithmic characterization in this unexplored avenue.
Optimal control, in its broadest sense, examines the principle of optimization over dynamical systems. This methodological perspective naturally arises in training neural differential equations and can be applied to standard DNNs by interpreting layer propagation as discrete timesteps along a dynamical system, with Backpropagation emerging as an approximate dynamic programming method. Through development, we emphasize the significance of control-theoretic components such as differential programming, nonlinear Feynman-Kac, and path integral theory, which unify existing optimization methods while extending them to handle a broader class of complex dynamics and problem setups that can otherwise be hard to adapt or foresee.
Our work demonstrates the broad applicability of control-theoretic optimization methods in learning various deep architectures, including convolutional networks, neural ordinary differential equations, and neural stochastic differential equations such as denoising diffusion models. The resulting computational frameworks improve test-time performance and inference efficiency, enhance training robustness against unstable hyperparameters, accelerate convergence in terms of wall-clock time, and are applicable to a wide range of scientific problems, including image generation, restoration, translation, as well as solving mean-field games and opinion modeling.
Sponsor
Date
2024-07-05
Extent
Resource Type
Text
Resource Subtype
Dissertation