Organizational Unit:
H. Milton Stewart School of Industrial and Systems Engineering

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 1737
  • Item
    On Parameter Efficiency of Neural Language Models
    (Georgia Institute of Technology, 2024-01-04) Liang, Chen
    In recent years, pre-trained neural language models have achieved remarkable capabilities across various natural language understanding and generation tasks. However, the trend of scaling these models to encompass billions of parameters, while enhancing adaptability and emergent capabilities, has brought forth significant deployment challenges due to their massive size. These challenges include constraints in model storage and inference latency for real-world deployment, intensive time and computational costs for task adaptation, and the presence of substantial redundant parameters that affect task adaptability. Motivated by these challenges, this thesis aims to improve the parameter efficiency of these models, seeking to minimize storage requirements, accelerate inference and adaptation, and enhance generalizability. \noindent {\it -- Improving Parameter Utilization in Neural Language Models} \\ While recent studies have identified significant redundancy in pre-trained neural language models, the impact of parameter redundancy on model generalizability remains largely underexplored. We first examine the relationship between parameter redundancy and model generalizability. Observing that removing redundant parameters improves generalizability, we propose an adaptive optimization algorithm for fine-tuning to improve the utilization of the redundant parameters. Experimental results validate increased generalization across various downstream tasks. \noindent {\it -- Model Compression in Neural Language Models} \\ We explore model compression methods, including weight pruning and knowledge distillation, to reduce model storage and accelerate inference. We first develop a reliable iterative pruning method that accounts for uncertainties in training dynamics. Then, we dive into the realm of knowledge distillation, addressing the large teacher-student ``knowledge gap" that often hampers the student's performance. To tackle this, we offer two solutions for producing students for specific tasks by selectively distilling task-relevant knowledge. In scenarios demanding student adaptability across diverse tasks, we propose to reduce the knowledge gap by combining iterative pruning with distillation. Our approaches significantly surpass conventional distillation methods at similar compression ratios. \noindent {\it -- Efficient Task Adaptation in Neural Language Models} \\ While fine-tuning is an essential adaptation method for attaining satisfactory performance on downstream tasks, it is both computation-intensive and time-consuming. To speed up task adaptation, we study the hypernetwork approach, which employs an auxiliary hypernetwork to swiftly generate task-specific weights based on few-shot demonstration examples. We improve the weight generation scheme by exploiting the intrinsic weight structure as an inductive bias, enhancing sample efficiency for hypernetwork training. The method shows superior generalization performance on unseen tasks compared to existing hypernetwork methods.
  • Item
    Fundamental Limits and Algorithms for Database and Graph Alignment
    (Georgia Institute of Technology, 2023-12-12) Dai, Osman Emre
    Data alignment refers to a class of problems where given two sets of anonymized data pertaining to overlapping sets of users, the goal is to identify the correspondences between the two sets. If the data of a user is contained in both sets, the correlation between the two data points associated with the user might make it possible to determine that both belong to the same user and hence link the data points. Alignment problems are of practical interest in applications such as privacy and data junction. Data alignment can be used to de-anonymize data, therefore, studying the feasibility of alignment allows for a more reliable understanding of the limitations of anonymization schemes put in place to protect against privacy breaches. Additionally, data alignment can aid in finding the correspondence between data from different sources, e.g. different sensors. The data fusion performed through data alignment in turn can help with variety of inference problems that arise in scientific and engineering applications. This thesis considers two types of data alignment problems: database and graph alignment. Database alignment refers to the setting where each feature (i.e. data points) in a data set is associated with a single user. Graph alignment refers to the setting where data points in each data set are associated with pairs of users. For both problems, we are particularly interested in the asymptotic case where n, the number of users with data in both sets, goes to infinity. Nevertheless our analyses often yield results applicable to the finite n case. To develop a preliminary understanding of the database alignment problem, we first study the closely related problem of planted matching with Gaussian weights of unit variance, and derive tight achievability bounds that match our converse bounds: Specifically we identify different inequalities between log n and the signal strength (which corresponds to the square of the difference between the mean weights of planted and non-planted edges) that guarantee upper bounds on the log of the expected number of errors. Then, we study the database alignment problem with Gaussian features in the low per-feature correlation setting where the number of dimensions of each feature scales as ω(log n): We derive inequalities between log n and signal strength (which, for database alignment, corresponds to the mutual information between correlated features) that guarantee error bounds matching those of the planted matching setting, supporting the claimed connection between the two problems. Then, relaxing the restriction on the number of dimensions of features, we derive conditions on signal strength and dimensionality that guarantee smaller upper bounds on the log of the expected number of errors. The stronger results in the O(log n)-dimensional-feature setting for Gaussian databases show how planted matching, while useful, is not a perfect substitute to understand the dynamics of the more complex problem of database alignment. For graph alignment, we focus on the correlated Erdős–Rényi graph model where the data point (i.e. edge) associated with each pair of users in a graph is a Bernoulli random variable that is correlated with the data point associated with the same pair in the other graph. We study a canonical labeling algorithm for alignment and identify conditions on the density of the graphs and correlation between edges across graphs that guarantees the recovery of the true alignment with high probability.
  • Item
    Sample complexity of Reinforcement Learning algorithms with a focus on policy space methods
    (Georgia Institute of Technology, 2023-12-11) Khodadadian, Sajad
    In this thesis, we develop fast Reinforcement Learning algorithms with finite sample complexity guarantees. The work is divided into two main parts. In the first, we investigate stochastic approximation across various domains to establish finite sample complexity bounds. We study two settings: federated stochastic approximation and two-time-scale linear stochastic approximation with Markovian noise. In the former, we develop a FedSAM algorithm where multiple agents are utilized to solve a fixed-point equation, following a stochastic approximation with Markovian noise. Moreover, we show that FedSAM has linear speedup with respect to the number of agents, while enjoying a constant communication cost. In the latter, we explore two-time-scale linear stochastic approximation with Markovian noise, establishing tight finite-time bounds. The second part delves into finite-time bounds for Reinforcement Learning algorithms, with an emphasis on policy space methods. First, we consider two-time-scale natural actor-critic algorithm with on-policy data. For this algorithm we establish a $\epsilon^{-6}$ sample complexity for convergence to the global optimum. Next, we study two-loop natural actor-critic, and we establish a $\epsilon^{-3}$ sample complexity, improving upon the two-time-scale counterpart. In this case, we consider an off-policy sampling strategy. To enhance the sample complexity of the natural actor-critic, we separate the algorithm into 'Actor' and 'Critic' components. For the Critic, we consider federated TD-learning and TD-learning with Polyak averaging. For the former, we show a linear speedup, and in the latter we establish a tight finite time bound. Furthermore, we establish a tight finite time convergence bound for the TDC algorithm. For the Actor, we demonstrate linear and superlinear convergence rates for the natural policy gradient.
  • Item
    Conic Reformulation Methods in Revenue Management
    (Georgia Institute of Technology, 2023-11-29) Shao, Hongzhang
    This thesis presents a comprehensive study on optimization problems in revenue management, with a particular focus on the development and application of conic programming reformulation techniques. The work is grounded in three distinct but interrelated papers, each addressing a unique aspect of revenue management. The first paper investigates the trade-off between pickup time and idle time in ride-hailing systems. Pickup time refers to the duration from when a car is dispatched to pick up a rider until the rider is picked up. The study reveals that if cars spend less idle time waiting for a dispatch, the mean distance between a rider and the closest available car increases, resulting in longer pickup times. This phenomenon is crucial in ride-hailing as every minute spent on pickup reduces the time available for transporting riders. Despite its importance, existing literature on price optimization and repositioning in ride-hailing systems often overlooks pickup time. This paper presents a novel approach to reformulate a simultaneous price and repositioning optimization problem, considering the distribution of pickup time, into a tractable convex optimization problem. The optimal solution derived from this approach significantly outperforms policies proposed in previous studies, as demonstrated in simulations. The second paper delves into the complexities of identifying the ideal product pricing and assortment, a critical aspect of revenue management. The challenge lies in concurrently establishing prices and assortments while navigating a resource network with finite capacity. Existing research on the static joint optimization of price and assortment often overlooks resource constraints. Our study, however, addresses the revenue management problem with resource constraints and price boundaries, where prices and product assortments must be collaboratively determined over time. We demonstrate that under the Markov chain (MC) choice model (which subsumes the multinomial logit (MNL) model), the choice-based joint optimization problem can be transformed into a tractable convex conic optimization problem. For static joint optimization without resource constraints, we reveal that the optimal price for a product remains consistent across all optimal solutions with positive sales. With the same price vector, an assortment can be optimal if it both contains and is contained by optimal assortments. Interestingly, optimal assortments in such problems are closed under union but may not be closed under intersection. In the context of revenue management problems, we prove that an optimal solution with a constant price vector is possible, even in the presence of resource constraints. This finding suggests that there is no need to continuously adjust prices throughout the planning period. The third paper addresses a fundamental problem in revenue management: finding the optimal choice of product attributes. These attributes significantly influence both the market share and profit margin of a product. The decision-maker is tasked with choosing the optimal vector of attributes for each product to maximize total profit, revenue, or market share. However, existing literature on product line design with multiple attributes often results in intractable optimization problems. In contrast, studies on pricing problems under discrete choice models typically assume that price is the only attribute to be chosen for each product, and the methods used in such literature cannot be generalized to solve optimization problems with multiple product attributes. In this paper, we introduce a method to reformulate static multi-attribute optimization problems and multi-stage fluid optimization problems with resource constraints and upper and lower bounds on attributes as tractable convex conic optimization problems. Our results apply to optimization problems under the multinomial logit (MNL) model, the Markov chain (MC) choice model, and with certain conditions, the nested logit (NL) model. This method also provides a unified approach to solve pricing problems under discrete choice models and can reproduce many existing results established under different methods. Overall, this thesis underscores the potential of conic programming reformulation techniques in solving complex optimization problems in revenue management, providing a unified approach that can be applied across various models and scenarios.
  • Item
    Design and Analysis of Stochastic Processing and Matching Networks
    (Georgia Institute of Technology, 2023-08-21) Jhunjhunwala, Prakirt Raj
    Stochastic Processing Networks (SPNs) and Stochastic Matching Networks (SMNs) play a crucial role in various engineering domains, encompassing applications in Data Centers, Telecommunication, Transportation, and more. As these networks become increasingly complex and integral to modern systems, designing efficient decision-making policies while obtaining strong performance guarantees on throughput and delay has become a pressing research area. This thesis addresses the multifaceted challenges prevalent in today's stochastic networks and investigates their impact on system performance. Major design considerations are thoroughly examined, including scalability, customer abandonment, multiple bottlenecks, and adherence to Service Level Agreements (SLAs). Each of these factors heavily influences the system delay and queue length. In Chapter 2, we focus on establishing bounds for the tail probabilities of queue lengths in queueing systems. The results help provide strict SLA guarantees for large-scale systems. As obtaining exact steady-state distributions is often infeasible, the study provides exponentially decaying bounds in Many-Server Heavy-Traffic regimes, where the load on the system approaches the capacity simultaneously as the system size grows large. Unlike other approaches, the derived bounds are not limited to asymptotic cases and remain applicable even for finite values of load and system size. The method uses an exponential Lyapunov function to bound the Moment-Generating Function (MGF) of queue lengths, and the application of Markov's inequality contributes to the derivation of the tail bounds. To demonstrate our methodology, we primarily use a load balancing system operating under the Join-the-Shortest Queue policy (JSQ), and we obtain tail bounds applicable in non-asymptotic large-scale regimes as well as non-asymptotic Large Deviations regimes. In Chapter 3, we again look at a Load Balancing system operating under the Join-the-Shortest Queue policy (JSQ), but with an additional aspect of customer abandonments. In particular, we characterize the `distribution of appropriately centered and scaled steady-state queue length' (or limiting distribution) as the abandonment rate becomes very small. Our work encompasses the case when the system sees heavy traffic as well as the case when the system is overloaded. As the system load increases, we observe that the limiting distribution undergoes a phase transition from exponential to a truncated-normal and finally to a normal distribution. The chapter employs the Transform method to establish results about the limiting Moment Generating Function (MGF) of queue lengths. Afterward, in Chapter 4, we focus our study on understanding the performance of SPNs with multiple bottlenecks, for which the problem becomes significantly more challenging. For this, we use the Input-Queued Switch (IQ-Switch) model, which models a data center network and serves as a representative of SPNs with multiple bottlenecks. Prior literature has established that the well-studied MaxWeight policy provides superior throughput and mean queue length performance. Even though the MaxWeight algorithm results in small queue lengths, the complexity of implementing it is high, which is practically undesirable. We show that several classes of low time-complexity algorithms have similar mean queue lengths to MaxWeight when the system load is very high. Moving ahead, in Chapter 5, we aim to go beyond the mean queue length and provide strict SLA or tail guarantees for an SPN with multiple bottlenecks. We tackle this problem by studying the steady-state queue length distribution. For the case of IQ-Switch, finding `the complete joint distribution of queue length vector in heavy traffic' (or limiting joint distribution) was posed as an open problem in prior literature. Our work solves the open problem for IQ-switch (under a particular conjecture) operating under the MaxWeight scheduling algorithm and other low-complexity algorithms considered in Chapter 4. For IQ-Switch, under uniform traffic and heavy load condition, we provide the limiting distribution in terms of a non-linear combination of independent and exponentially distributed random variables. We do this by establishing a functional equation on the Laplace transform of the limiting joint distribution using the Transform method, which can be solved to obtain the result. Finally, in Chapter 6, we study the queueing dynamics of an SMN using the exciting example of a quantum network. This system is much harder to analyze than an SPN, as the effective service rate depends on the system state. We aimed to provide performance guarantees on the queue length like in previous chapters. However, we soon realized that even the fundamental problem of finding the stability conditions for an SMN is not entirely answered. Thus, in this chapter, we characterize the stability conditions for a class of quantum networks under the MaxWeight policy. Interestingly, we find that the stability region of the quantum network is defined as the convex hull of the achievable throughput of suitably designed sub-networks.
  • Item
    Exploiting Problem Structure for Faster Optimization: A Trilinear Saddle Point Approach
    (Georgia Institute of Technology, 2023-07-28) Zhang, Zhe
    Optimization is vital in operations research, encompassing model fitting and decision-making. The exponential growth of data holds promise for realistic models and intelligent decision-making. However, the sheer volume and the exceptional dimension of big data make computations prohibitively expensive and time-consuming. In this thesis, we propose a trilinear saddle point approach to tackle some challenges in big-data optimization. By effectively leveraging problem structure, our approach significantly improves computation complexities for a few important problem classes in stochastic programming and non-linear programming. This offers valuable insights into the intrinsic computational hardness. In Chapter Two, we consider a distributionally robust two-stage stochastic optimization problem with discrete scenario support. While much research effort has been devoted to tractable reformulations for DRO problems, especially those with continuous scenario support, few efficient numerical algorithms are developed, and most of them can neither handle the nonsmooth second-stage cost function nor the large number of scenarios $K$ effectively. We fill the gap by reformulating the DRO problem as a trilinear min-max-max saddle point problem and developing novel algorithms that can achieve an $O(1/\epsilon)$ iteration complexity which only mildly depends on the scenario number . The major computations involved in each iteration of these algorithms can be conducted in parallel if necessary. Besides, for solving an important class of DRO problems with the Kantorovich ball ambiguity set, we propose a slight modification of our algorithms to avoid the expensive computation of the probability vector projection. Finally, preliminary numerical experiments are conducted to demonstrate the empirical advantages of the proposed algorithms. In Chapter Three, we study the convex nested stochastic composite optimization (NSCO) problem, which finds applications in reinforcement learning and risk-averse optimization. Existing NSCO algorithms exhibit significantly worse stochastic oracle complexities compared to those without nested structures, and they require all outer-layer functions to be smooth. To address these challenges, we propose a stochastic trilinear (multi-linear) saddle point formulation that enables the design of order-optimal algorithms for general convex NSCO problems. When all outer-layer functions are smooth, we propose a stochastic sequential dual (SSD) method to achieve an oracle complexity of $O(1/\epsilon^2)$ ($O(1/\epsilon)$) when the problem is non-strongly (strongly) convex. In cases where there are structured non-smooth or general non-smooth outer-layer functions, we propose a nonsmooth stochastic sequential dual (nSSD) method, achieving an oracle complexity of $O(1/\epsilon^2)$. Notably, we prove that this $O(1/\epsilon^2)$ complexity is unimprovable even under a strongly convex setting. These results demonstrate that the convex NSCO problem shares similar oracle complexities as those without nested compositions, except for strongly convex and outer-non-smooth problems. In Chapter Four, we investigate the communication complexity of convex risk-averse optimization over a network. The problem generalizes the well-studied risk-neutral finite-sum distributed optimization problem and its importance stems from the need to handle risk in an uncertain environment. For algorithms in the literature, there exists a gap in communication complexities for solving risk-averse and risk-neutral problems. To address this gap, we utilize a trilinear saddle point reformulation to design two distributed algorithms: the distributed risk-averse optimization (DRAO) method and the distributed risk-averse optimization with sliding (DRAO-S) method. The single-loop DRAO method involves solving potentially complex subproblems, while the more sophisticated DRAO-S method requires only simple computations. We establish lower complexity bounds to show their communication complexities to be unimprobvable, and conduct numerical experiments to illustrate the encouraging empirical performance of the DRAO-S method. In Chapter Five, we utilize the trilinear saddle point approach to develop new complexity results for classic nonlinear function-constrained optimization. We introduce the single-loop Accelerated Constrained Gradient Descent (ACGD) method, which modifies Nesterov's celebrated Accelerated Gradient Descent (AGD) method by incorporating a linearly-constrained descent step. Lower complexity bounds are provided to establish the tightness of ACGD's complexity bound under a specific optimality regime. To enhance efficiency for large-scale problems, we propose the ACGD with Sliding (ACGD-S) method. ACGD-S replaces computationally demanding constrained descent steps with basic matrix-vector multiplications. ACGD-S shares the same oracle complexity as ACGD and achieves an unimprovable computation complexity measured by the number of matrix-vector multiplications. These advancements offer insights into complexity and provide efficient solutions for nonlinear function-constrained optimization, catering to both general and large-scale scenarios.
  • Item
    Decomposition Algorithms for Certain Integer Problems over Networks
    (Georgia Institute of Technology, 2023-07-28) Li, Yijiang
    Integer optimization stands as a fundamental and widely embraced tool in addressing many real-world problems. Among the abundant applications of integer optimization, numerous network-related problems arise and exhibit significant influence in many areas such as transportation, energy systems, and supply chains. The intricate nature of these problems often results in complicated large-scale formulations that are computationally expensive to solve directly. Instead, decomposition is a more computationally tractable means in many cases. In this thesis, we focus on a few integer problems over networks. In Chapter 2, we investigate the airport flight-to-gate assignment problem, where the goal is to minimize the total delays by optimally assigning each scheduled flight to a compatible gate. We provide a column generation approach for solving this problem. We decompose the pricing problem such that each gate is the basis for an independent pricing problem to be solved and use a combination of an approximation algorithm based on the submodularity of the underlying set and dynamic programming algorithms to solve the independent pricing problems. We also design and employ a rolling horizon method and block decomposition algorithm to solve the large-sized instances. Finally, we perform extensive computational experiments to validate the performance of our approach. In Chapter 3, we focus on the gas network design problem. Gas networks are used to transport natural gas, which is an important resource for both residential and industrial customers throughout the world. The gas network design problem is a challenging nonlinear and non-convex optimization problem. We propose a decomposition framework to solve this problem. In particular, we utilize a two-stage procedure that involves a convex reformulation of the original problem. We conduct experiments on a benchmark network to validate and analyze the performance of our framework. In Chapter 4, we combine the water network design and operation problems. In general, the design problems consider pipe sizing and placements of pump stations, while the operation problems are multiple time period problems that account for temporal changes in supply and demand and consider the scheduling of the installed pump stations. We propose two methods to obtain good candidate primal solutions. One method is a similar decomposition framework that is used in Chapter 3 while the other method is based on a time decomposition. We conduct computational experiments on networks that closely resemble real-world networks. In Chapter 5, we consider the resiliency of infrastructure networks. The infrastructure systems generally consist of multiple types of infrastructure facilities that are interdependent. In the event of natural disaster, some of the infrastructure nodes can be damaged and disabled creating failures and such failures can propagate to other facilities that depend on the disabled facilities creating a cascade of failures and eventually a potential system collapse. We propose a bilevel interdiction model to study this problem of cascading failures in an interdependent infrastructure network with a probabilistic dependency graph. We utilize a Benders type decomposition algorithm to solve the resulting formulation. Computational experiments are performed using synthetic networks to validate the performance of this algorithm.
  • Item
    Optimal Sampling for Statistical Modeling and Validation
    (Georgia Institute of Technology, 2023-05-18) Vakayil, Akhil
    In statistics and machine learning, often we need to sample from or partition data, e.g., for generating training-testing splits, subsampling for tractable statistical analysis, etc. This thesis presents an optimal sampling/partitioning methodology and its applications. Chapter 1 provides the motivation behind the proposed methodology from a validation perspective, Chapter 2 gives an efficient algorithm that makes the optimal sampling applicable to large datasets, and finally, Chapter 3 presents a novel Gaussian process approximation exploiting the proposed sampling methodology.
  • Item
    Advances in Large-Scale Power System Operations: Reconstruction, Reliability, Learning
    (Georgia Institute of Technology, 2023-05-09) Chatzos, Minas
    Modern Power System operations are based on large-scale optimization problems that are becoming increasingly more complex and subject to higher degrees of uncertainty with multiple components such as renewable generation, distributed energy sources, electrification of transportation and extreme weather. Frameworks based on Optimization under uncertainty and Machine Learning have the potential to facilitate and improve the Power Grid operation in multiple ways. The former in terms of cost reduction and enhancing system reliability, and the latter in faster generation of solutions and real-time risk assessment. The thesis presents advancements on the scalability of such methods to large-scale power networks and evaluates the impact and benefits of the methods in the operations. The first part of the thesis addresses the availability of suitable power grid data for conducting modern research in Power Systems, access to which is limited by privacy concerns and the sensitive nature of energy infrastructure. This lack of data, in turn, hinders the development of modern research avenues such as machine learning approaches or stochastic formulations. To overcome this challenge, we propose a systematic, data-driven framework for reconstructing high-fidelity spatio-temporal consistent time series, using a combination of public and private set of data. The proposed approach, from geo-spatial information and generation capacity reconstruction, to time-series disaggregation, is applied to the French transmission grid. Thereby, synthetic but highly realistic time series data, spanning multiple years with a 5-minute granularity, is generated at the bus level. The second part of the thesis focuses on the impact of Reliability Assessment Commitment (RAC) processes in modern Power System operations. The recent growth of Renewable Energy sources and Distributed Energy sources has introduced significant operational uncertainty in front and behind the meter, increasing forecasting errors and reliability risks in the operations. Due to this fact, Independent System Operators (ISOs) execute day-ahead and intra-day RAC processes to address unforeseen changes in power grid conditions. Based on the operation pipeline of the Midcontinent Independent System Operator (MISO), we conduct a systematic analysis of the impact of RAC processes in MISO operations and propose a two-stage Stochastic Programming extension to MISO's deterministic day-ahead RAC process. To overcome the computational challenge of solving the stochastic problem, an accelerated version of the Bender's Decomposition algorithm is developed that is scalable to industry-sized instances. A novel computational analysis is conducted on the benefits of deterministic and stochastic RAC processes in modern large-scale power grid instances from MISO and the French Transmission System. These benefits are demonstrated both in terms of operational cost and Power System-specific risk and reliability metrics. The third part of the thesis proposes a novel Machine Learning (ML) approach for learning the behavior of the AC Optimal Power Flow problem (AC-OPF), a problem at the core of the operations, that features a fast and scalable training. It is motivated by the significant training time needed by existing ML approaches for predicting AC-OPF. The proposed approach is two-stage and exploits a spatial decomposition of the power network that is viewed as a set of regions. The first stage learns to predict the flows and voltages on the buses and lines coupling the regions, and the second stage trains, in parallel, the ML models for each region. The predictions can then seed a power flow model to eliminate the physical constraint violations, resulting in minor violations only for the operational bound constraints. Experimental results on the French transmission system (up to 6,700 buses) and large publicly available topologies (up to 9,000 buses) demonstrate the potential of the approach. Within a short training time, the approach predicts AC-OPF solutions with very high fidelity, producing significant improvements over existing centralized methods. The proposed approach opens the possibility of training ML models quickly to respond to changes in operating conditions.
  • Item
    Bridging Discrete and Continuous Methods for Faster Optimization and Machine Learning
    (Georgia Institute of Technology, 2023-04-27) Mortagy, Hassan
    With the machine learning and automation age we are in, it has become commonplace for algorithms to make decisions that impact our lives. Many such algorithms use optimization solvers as a subroutine, and there has been a revived interest in using robust approaches (like first-order optimization methods) that require low memory storage and iteration costs. In a wide range of applications, the optimization problem an algorithm is trying to solve contains inherent combinatorial structure that one can exploit to obtain novel algorithms. Further, combinatorial optimizers have developed elegant characterizations of properties of convex minimizers over combinatorial polytopes; however, this theory has not been adequately integrated within the iterative optimization framework, leaving a significant opportunity to speed-up continuous optimization algorithms used in machine and online learning. Although the theory of combinatorial and continuous optimization methods has evolved independently over the last many years for the most part, it is now crucial to bridge that theory with the aim of developing algorithms that can handle large-scale data-driven problems. In Chapter 3, we focus on improving the rates of conditional gradient variants (in the presence of combinatorial structures) while maintaining their efficiency. We develop a novel theoretical framework that provides a unifying view of various descent directions in the literature and demystifies the impact of the movement in these directions towards attaining constrained minimizers with the aim of obtaining better convergence rates. Through our framework, we prove that Frank-Wolfe (FW) steps are greedy and correspond to infinite stepsizes along the negative gradient followed by a projection, thereby drawing a novel connection between projection-free and projected gradient (PG) algorithms. We use our insights to develop a novel algorithm SHADOW-CG that combines FW steps (i.e., greedily wrap around the polytope) and shadow steps (i.e., optimal local descent direction) and prove that it enjoys linear convergence. The convergence rate depends on the combinatorial structure of the face-lattice of the polytope, which is invariant to the geometry of the problem. We show that for simple polytopes with combinatorial structure (e.g., the hypercube and the simplex), our algorithms are the best feasible descent methods in terms of convergence rates and iteration complexity. In Chapter 4, we next focus on speeding up iterative projections in mirror descent variants in the presence of combinatorial polytopes while maintaining their optimal rates. To capture a wide range of combinatorial decision sets encountered in practice, we consider submodular polytopes. We develop a toolkit to speed up the computation of iterative projections over submodular polytopes using both discrete and continuous perspectives. We subsequently adapt the away-step Frank-Wolfe algorithm to use this information and enable early termination. For the special case of cardinality-based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of $\Omega(n/ \log(n))$. Our theoretical results show orders of magnitude reduction in runtime in preliminary computational experiments In Chapter 5, we consider the case when the decision set in our optimization problem is given by vertices of a combinatorial (flow) polytope through the lens of the network reconfiguration problem. The network reconfiguration problem seeks to find a rooted tree $T$ such that the energy of the (unique) feasible electrical flow over $T$ is minimized. We prove the first approximation factor for this problem since it emerged in 1989. We propose a randomized iterative rounding algorithm that rounds a relaxed point in the relative interior of the flow polytope to a vertex. We prove that the algorithm achieves an $O(m - n)$ approximation and show that this bound is optimal for planar graphs. In addition, we provide novel lower bounds and corresponding approximation factors for various settings ranging from $O( \sqrt{n})$ over grids with uniform resistances on edges and $O(1)$ for grids with uniform edge resistances and demands. In our computational experiments, our algorithms take a couple of seconds, whereas the mean time it takes the current heuristic used in practice to attain the same performance is around 10 hours. This improvement will enable operators to reconfigure distribution networks more frequently in practice. Finally, in Chapter 6, we consider the case when our decision set is given by a mixed-integer program (MIP). There has been a recent emphasis on using machine learning (ML) models to automate clinical decision-making; however, all these works look at Electronic Medical Record (EMR) data without incorporating context or clinical expertise to detect erroneous or untrustworthy data. We are the first to translate clinical domain knowledge into high-dimensional mathematical constraints and project EMR data of ICU patients onto those clinical domain constraints. These projections can identify and correct erroneous clinical data. Computing projections can also help quantify how much a sick patient’s laboratory values and vitals have deviated from the normal range, thereby obtaining trust scores that improve the performance of ML classification models in clinical settings. We design a machine learning pipeline incorporating the proposed projections methodology to predict sepsis 6 hours before its onset, increasing the precision and AUC by approximately a factor of 1.5 compared to ML models trained without using the projections. Our algorithm also outperforms the state-of-the-art ML algorithms for the early detection of sepsis.