Modern Statistical Methods for Optimization and Change-point Detection

Author(s)
Chen, Zhehui
Editor(s)
Associated Organization(s)
Series
Supplementary to:
Abstract
Optimization and change-point detection are two important problems in modern science and engineering. With remarkable advancements in computer engineering and electrical engineering, one of the challenging parts in these two problems is how to deal with big and high-dimensional data, {\it e.g.}, streaming data. The main focus of this thesis is to use recent statistical tools to study and develop efficient optimization algorithms for big data under different settings, and design effective and scalable change-point detection frameworks for network data. Chapter 1 of the thesis studies the partial least squares (PLS) with streaming data, which can be efficiently solved by a stochastic generalized Hebbian algorithm (SGHA). Theoretically, we characterize the three phases of the SGHA by diffusion processes, and establish the corresponding global rates of convergence to the global optima. Empirically, we conduct some numerical experiments and the results also support our theory. In Chapter 2, we then study the generalized eigenvalue (GEV) decomposition problem, a general form of PLS. We first show that the Lagrangian function of GEV enjoys two properties: 1.Equilibria are either stable or unstable; 2.Stable equilibria correspond to the global optima of the original GEV problem. Inspired by these nice properties, we design a simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem. By diffusion approximations, we obtain the first sample complexity result for the online GEV problem. Numerical results are also provided to support our theory. The goal of Chapter 3 is how to improve a sequential design strategy for the global optimization of black-box functions, called expected improvement (EI). We first identify the over-greediness issue of EI. To address this problem, we propose a new hierarchical expectation improvement (HEI) framework. HEI preserves a closed-form acquisition function, and encourages exploration of the optimization space. We then introduce hyperparameter estimation methods which allow HEI to mimic a fully Bayesian optimization procedure, while avoiding expensive Markov-chain Monte Carlo sampling steps. We prove the global convergence of HEI over a broad function space, and establish near-minimax convergence rates under certain prior specifications. Numerical experiments show the improvement of HEI over existing Bayesian optimization methods, for synthetic functions and a semiconductor manufacturing optimization problem. Chapter 4 then studies a bilevel optimization problem, which contains a follower problem and a leader problem. Taking adversarial training as an example, we propose a generic learning-to-learn (L2L) method to solve it. The key idea of L2L is that instead of applying hand-designed algorithms, {\it e.g.}, stochastic gradient methods, to the follower problem, we learn an optimizer parametrized by a neural network. Meanwhile, the leader learns a robust model to defend the malicious adversarial attacks generated by the learned optimizer. Our experiments over CIFAR datasets demonstrate that L2L improves upon existing methods in both robust accuracy and computational efficiency. Moreover, we show that the proposed L2L method also works for other bilevel problems in machine learning such as adversarial interpolation training and general adversarial imitation learning. Chapter 5 considers a change-point detection problem with network data, and designs a new Conditional AutoRegressive Detection (CARD) monitoring system, which models spatial correlations over the network via a Conditional AutoRegressive (CAR) model. We show that the conditional specification of the CAR model allows for a decentralized detection method to leverage spatial correlations by utilizing neighborhood information on each node. Theoretically, we prove that the expected detection delay for CARD is smaller than that for a detection method which ignores spatial correlations, thus showing the improved detection power of the proposed method. We then demonstrate the improved detection performance of CARD over existing methods in a suite of numerical simulations and in two applications: power grid monitoring, and sparse population coding of biological neural networks.
Sponsor
Date
2021-05-03
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI