Large-Scale Optimization for Deep Neural Network Architecture: A Dynamical System Theory Perspective

Author(s)
Liu, Guan-Horng
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Daniel Guggenheim School of Aerospace Engineering
The Daniel Guggenheim School of Aeronautics was established in 1931, with a name change in 1962 to the School of Aerospace Engineering
Supplementary to:
Abstract
Optimization of deep neural networks (DNNs) has been a driving force in the advancement of modern machine learning and artificial intelligence. Despite efforts to design DNN architectures that leverage domain-specific knowledge, the development of optimization algorithms for training these million-parameter functions has often progressed independently of architectural innovations. This thesis delves into large-scale optimization methods that are not only aware of but also leverage the underlying deep architectural structures being optimized. Specifically, we demonstrate that the dynamical system and optimal control theory pave a profound foundation for algorithmic characterization in this unexplored avenue. Optimal control, in its broadest sense, examines the principle of optimization over dynamical systems. This methodological perspective naturally arises in training neural differential equations and can be applied to standard DNNs by interpreting layer propagation as discrete timesteps along a dynamical system, with Backpropagation emerging as an approximate dynamic programming method. Through development, we emphasize the significance of control-theoretic components such as differential programming, nonlinear Feynman-Kac, and path integral theory, which unify existing optimization methods while extending them to handle a broader class of complex dynamics and problem setups that can otherwise be hard to adapt or foresee. Our work demonstrates the broad applicability of control-theoretic optimization methods in learning various deep architectures, including convolutional networks, neural ordinary differential equations, and neural stochastic differential equations such as denoising diffusion models. The resulting computational frameworks improve test-time performance and inference efficiency, enhance training robustness against unstable hyperparameters, accelerate convergence in terms of wall-clock time, and are applicable to a wide range of scientific problems, including image generation, restoration, translation, as well as solving mean-field games and opinion modeling.
Sponsor
Date
2024-07-05
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI