Title:
Exploiting Problem Structure for Faster Optimization: A Trilinear Saddle Point Approach

Thumbnail Image
Author(s)
Zhang, Zhe
Authors
Advisor(s)
Lan, Guanghui
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Series
Supplementary to
Abstract
Optimization is vital in operations research, encompassing model fitting and decision-making. The exponential growth of data holds promise for realistic models and intelligent decision-making. However, the sheer volume and the exceptional dimension of big data make computations prohibitively expensive and time-consuming. In this thesis, we propose a trilinear saddle point approach to tackle some challenges in big-data optimization. By effectively leveraging problem structure, our approach significantly improves computation complexities for a few important problem classes in stochastic programming and non-linear programming. This offers valuable insights into the intrinsic computational hardness. In Chapter Two, we consider a distributionally robust two-stage stochastic optimization problem with discrete scenario support. While much research effort has been devoted to tractable reformulations for DRO problems, especially those with continuous scenario support, few efficient numerical algorithms are developed, and most of them can neither handle the nonsmooth second-stage cost function nor the large number of scenarios $K$ effectively. We fill the gap by reformulating the DRO problem as a trilinear min-max-max saddle point problem and developing novel algorithms that can achieve an $O(1/\epsilon)$ iteration complexity which only mildly depends on the scenario number . The major computations involved in each iteration of these algorithms can be conducted in parallel if necessary. Besides, for solving an important class of DRO problems with the Kantorovich ball ambiguity set, we propose a slight modification of our algorithms to avoid the expensive computation of the probability vector projection. Finally, preliminary numerical experiments are conducted to demonstrate the empirical advantages of the proposed algorithms. In Chapter Three, we study the convex nested stochastic composite optimization (NSCO) problem, which finds applications in reinforcement learning and risk-averse optimization. Existing NSCO algorithms exhibit significantly worse stochastic oracle complexities compared to those without nested structures, and they require all outer-layer functions to be smooth. To address these challenges, we propose a stochastic trilinear (multi-linear) saddle point formulation that enables the design of order-optimal algorithms for general convex NSCO problems. When all outer-layer functions are smooth, we propose a stochastic sequential dual (SSD) method to achieve an oracle complexity of $O(1/\epsilon^2)$ ($O(1/\epsilon)$) when the problem is non-strongly (strongly) convex. In cases where there are structured non-smooth or general non-smooth outer-layer functions, we propose a nonsmooth stochastic sequential dual (nSSD) method, achieving an oracle complexity of $O(1/\epsilon^2)$. Notably, we prove that this $O(1/\epsilon^2)$ complexity is unimprovable even under a strongly convex setting. These results demonstrate that the convex NSCO problem shares similar oracle complexities as those without nested compositions, except for strongly convex and outer-non-smooth problems. In Chapter Four, we investigate the communication complexity of convex risk-averse optimization over a network. The problem generalizes the well-studied risk-neutral finite-sum distributed optimization problem and its importance stems from the need to handle risk in an uncertain environment. For algorithms in the literature, there exists a gap in communication complexities for solving risk-averse and risk-neutral problems. To address this gap, we utilize a trilinear saddle point reformulation to design two distributed algorithms: the distributed risk-averse optimization (DRAO) method and the distributed risk-averse optimization with sliding (DRAO-S) method. The single-loop DRAO method involves solving potentially complex subproblems, while the more sophisticated DRAO-S method requires only simple computations. We establish lower complexity bounds to show their communication complexities to be unimprobvable, and conduct numerical experiments to illustrate the encouraging empirical performance of the DRAO-S method. In Chapter Five, we utilize the trilinear saddle point approach to develop new complexity results for classic nonlinear function-constrained optimization. We introduce the single-loop Accelerated Constrained Gradient Descent (ACGD) method, which modifies Nesterov's celebrated Accelerated Gradient Descent (AGD) method by incorporating a linearly-constrained descent step. Lower complexity bounds are provided to establish the tightness of ACGD's complexity bound under a specific optimality regime. To enhance efficiency for large-scale problems, we propose the ACGD with Sliding (ACGD-S) method. ACGD-S replaces computationally demanding constrained descent steps with basic matrix-vector multiplications. ACGD-S shares the same oracle complexity as ACGD and achieves an unimprovable computation complexity measured by the number of matrix-vector multiplications. These advancements offer insights into complexity and provide efficient solutions for nonlinear function-constrained optimization, catering to both general and large-scale scenarios.
Sponsor
Date Issued
2023-07-28
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI