Organizational Unit:

H. Milton Stewart School of Industrial and Systems Engineering

Permanent Link

https://hdl.handle.net/1853/70758

Parent Organization

Organizational Unit

College of Engineering

Includes Organization(s)

Organizational Unit

William M. Keck Virtual Factory Lab (VFL)

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1032

Full item page

Publication Search Results

Now showing 1 - 10 of 556

Advances in online convex optimization, games, and problems with bandit feedback

(Georgia Institute of Technology, 2019-12-16) Rivera Cardoso, Adrian

In this thesis we study sequential decision making through the lens of Online Learning. Online Learning is a very powerful and general framework for multi-period decision making. Due to its simple formulation and effectiveness it has become a tool of daily use in multibillion companies. Moreover, due to its beautiful theory and its tight connections with other fields, Online Learning has caught the attention of academics all over the world and driven first-class research. In the first chapter of this thesis, joint work with Huan Xu, we study a problem called: Risk-Averse Convex Bandit. Risk-aversion makes reference to the fact that humans prefer consistent sequences of good rewards instead of highly variable sequences with slightly better rewards. The Risk-Averse Convex Bandit addresses the fact that, while human decision makers are risk-averse, most algorithms for Online Learning are not. In this thesis we provide the first efficient algorithms with strong theoretical guarantees for the Risk-Averse Convex Bandit problem. In the second chapter, joint work with Rachel Cummings, we study the problem of preserving privacy in the setting of online submodular minimization. Submodular functions have multiple applications in machine learning and economics, which usually involve sensitive data from individuals. Using tools from Online Convex Optimization, we provide the first $\epsilon$-differentially private algorithms for this problem which are almost as good as the non-private versions for this problem. In the third chapter, joint work with Jacob Abernethy, He Wang, and Huan Xu, we study a dynamic version of two player zero-sum games. Zero-sum games are ubiquitous in economics, and central to understanding Linear Programming Duality, Convex and Robust Optimization, and Statistics. For many decades it was thought that one could solve this kind of games using sublinear regret algorithms for Online Convex Optimization. We show that while the previous is true when the game does not change with time, a naive application of these algorithms can be fatal if the game changes and the players are trying to compete with the Nash Equilibrium of the sum of the games in hindsight. In the fourth chapter, joint work with He Wang and Huan Xu, we revisit the decade old problem of Markov Decision Processes (MDPs) with Adversarial Rewards. MDPs provide a general mathematical framework for sequential decision making under uncertainty when there is a notion of `state', moreover they are the backbone of all Reinforcement Learning. We provide an elegant algorithm for this problem using tools from Online Convex Optimization. The algorithm's performance is comparable with current state of the art. We also consider the problem under the large state-space regime, and provide the first algorithm with strong theoretical guarantees.
Developing trust and managing uncertainty in partially observable sequential decision-making environments

(Georgia Institute of Technology, 2019-10-28) Bishop, Robert Reid

This dissertation consists of three distinct, although conceptually related, papers that are unified in their focus on data-driven, stochastic sequential decision-making environments, but differentiated in their respective applications. In Chapter 2, we discuss a special class of partially observable Markov decision processes (POMDPs) in which the sources of uncertainty can be naturally separated into a hierarchy of effects — controllable, completely observable effects and exogenous, partially observable effects. For this class of POMDPs, we provide conditions under which value and policy function structural properties are inherited from an analogous class of MDPs, and discuss specialized solution procedures. In Chapter 3, we discuss an inventory control problem in which actions are time-lagged, and there are three explicit sources of demand uncertainty — the state of the macroeconomy, product-specific demand variability, and information quality. We prove that a base stock policy — defined with respect to pipeline inventory and a Bayesian belief distribution over states of the macroeconomy — is optimal, and demonstrate how to compute these base stock levels efficiently using support vector machines and Monte Carlo simulation. Further, we show how to use these results to determine how best to strategically allocate capital toward a better information infrastructure or a more agile supply chain. Finally, in Chapter 4, we consider how to generate trust in so-called development processes, such as supply chains, certain artificial intelligence systems, and maintenance processes, in which there can be adversarial manipulation and we must hedge against the risk of misapprehension of attacker objectives and resources. We show how to model dynamic agent interaction using a partially-observable Markov game (POMG) framework, and present a heuristic solution procedure, based on self-training concepts, for determining a robust defender policy.
Statistical inference for optimization models: Sensitivity analysis and uncertainty quantification

(Georgia Institute of Technology, 2019-09-03) Curry, Stewart

In recent years, the optimization, statistics and machine learning communities have built momentum in bridging methodologies across domains by developing solutions to challenging optimization problems arising in advanced statistical modeling. While the field of optimization has contributed with general methodology and scalable algorithms to modern statistical modeling, fundamental statistics can also bring established statistical concepts to bear into optimization. In the operations research literature, sensitivity analysis is often used to study the sensitivity of the optimal decision to perturbations in the input parameters. Providing insights about how uncertain a given optimal decision might be is a concept at the core of statistical inference. Such inferences are essential in decision making because in some cases they may suggest that more data need to be acquired to provide stronger evidence for a decision; in others, they may prompt not making a decision at all because of the high uncertainty of the decision environment. Statistical inference can provide additional insights in decision making by quantifying how uncertainty in input data propagates into decision making. In this dissertation, we propose a methodological and computational framework for statistical inference on the decision solutions derived using optimization models, particularly, high-dimensional linear programming (LP). In Chapter 2, we explore the theoretical geometric properties of critical regions, an important concept from classical sensitivity analysis and parametric linear programming, and suggest a statistical tolerance approach to sensitivity analysis which considers simultaneous variation in the objective function and constraint parameters. Using the geometric properties of critical regions, in Chapter 3, we develop an algorithm that solves LPs in batches for sampled values right-hand-side parameters (i.e. b of Ax = b in the constraints). Moreover, we suggest a data-driven version of our algorithm that uses the distribution of the bs and empirically compare our approach to other methods on various problem instances. Finally, in Chapter 4, we suggest a unified framework for statistical inference on the decision solutions and propose the remaining work, including the implementation of the framework to making statistical inferences on spatial disparities in access to dental care services.
Scheduling techniques for complex resource allocation systems

(Georgia Institute of Technology, 2019-08-27) Ibrahim, Michael

This research program provides a complete framework for the real-time management of complex sequential resource allocation systems (RAS) with blocking and deadlocking effects in their dynamics. This framework addresses both control objectives of logical correctness and performance optimization for the considered RAS. A more detailed account of the thesis contributions is as follows: For the logical-correctness part of the presented framework, we leverage some formal Discrete Event System (DES)-based representations of the RAS behavior and we introduce a new class of deadlock avoidance policies (DAPs) for the considered sequential RAS that is characterized as the class of "maximal linear'' DAPs. We also provide a complete algorithm for enumerating all the elements of this policy class for a broad class of RAS instances. Finally, we present some numerical experimentation that demonstrates the efficacy of the presented algorithm. For the performance-optimization part of the presented framework, we provide a scheduling methodology that aims to maximize the throughput of complex RAS with blocking and deadlocking effects. This methodology is based on the solution of a pertinent “fluid” relaxation of the addressed scheduling problem, and it is enabled by the pre-established ability to control the underlying RAS for deadlock freedom, and by the further ability to express the corresponding DAP as a set of linear inequalities on the system state. Furthermore, we strengthen and further formalize these developments by taking advantage of the representational and analytical capabilities of the Petri net (PN) modeling framework, which is one of the main formal representational frameworks employed by the current DES theory. These capabilities enable a seamless treatment of the behavioral and the time-based dynamics of the underlying RAS, and they also support a notion of "fluidization'' of these dynamics through the more recent developments in the area of timed and untimed continuous PN models; this last capability was especially critical for the systematic derivation of the sought "fluid relaxation'' models and formulations. The information that is contained in the developed "fluid'' models, when combined with the "linear'' deadlock avoidance policies that have been employed in this work, provide a complete and very efficient controller for the considered RAS. Finally, we present a "correction'' algorithm that aims to detect potential suboptimal decisions that might be affected by the aforementioned controller and correct them. These "corrections'' can be effected either in an "off-line'' mode, by simulating the dynamics of the underlying RAS, or in an "on-line" mode where the underlying RAS is fully operational and the necessary corrections are inferred from the observed behavior of the system. In both of these modes, and especially the second one, the "correction'' algorithm endows the developed control framework with a "learning'' capability. From a more methodological standpoint, the results that enable this correcting mechanism are based on the sensitivity analysis of Markov reward processes and the statistical theory of "ranking & selection''. A series of numerical results demonstrate and assess the efficacy of the developed methodology.
Exact algorithms for routing problems

(Georgia Institute of Technology, 2019-08-19) Lagos Gonzalez, Felipe Andres

The study of routing problems has given rise to major developments in the fields of Operations Research (OR). In particular, the Vehicle Routing Problem (VRP) has motivated the development of many exact algorithms and heuristics. In the VRP, a planner designs minimum-cost-delivery routes from a depot to a set of geographically distributed customers, subject to capacity and business constraints. This problem is an important component of distribution systems and, in practice, several variants of the problem that exist are motivated by the diversity of operations rules and constraints in real-life applications. The VRP generalizes the Traveling Salesman Problem (TSP), so any of its variants present a computational challenge. We study a stochastic variant of the VRP and the Inventory Routing Problem (IRP), a problem that was initially studied as a variant of the VRP. In Chapter 2, we study the Vehicle Routing Problem with Probabilistic Customers (VRP-PC), a two-stage stochastic optimization problem that is a fundamental building block within the broad family of stochastic routing models. In the first stage, a dispatcher determines a set of vehicle routes serving all potential customer locations, before actual requests for service realize. In the second stage, vehicles are dispatched after the subset of customers requiring service is observed; a customer not requiring service is skipped from its planned route at execution. The objective is to minimize the expected vehicle travel cost by assuming known customer realization probabilities. We propose a column generation framework to solve the VRP-PC to a given optimality tolerance. Specifically, we present two novel algorithms, one that under-approximates a solution's expected cost, and another that uses its exact expected cost. Each algorithm is equipped with a route pricing mechanism that iteratively improves the approximation precision of a route's reduced cost; this produces fast route insertions at the start of the algorithm and reaches termination conditions at the end of the execution. Compared to branch-and-cut algorithms for the VRP-PC using arc-based formulations, our framework can more readily incorporate sequence-dependent constraints such as customer time windows. We provide a priori and a posteriori performance guarantees for these algorithms, and demonstrate their effectiveness via a computational study on instances with realization probabilities for customers ranging from 0.5 to 0.9. In Chapter 3, we consider a variant of the Inventory Routing Problem (IRP), the Continuous Time IRP (CIRP). In time dependent models, such as the CIRP, the objective is to find the optimal times (continuous) at which activities occur and resources are utilized. These models arise whenever a schedule of activities needs to be constructed. A common approach consists of discretizing the planning time and then restricting the decisions to those time points. However, this approach leads to very large formulations that are intractable in practice. In the CIRP, a company manages the inventory of its customers, resupplying a single product from a single facility during a finite time horizon. The product is consumed at a constant rate (product per unit of time) by each customer. The customers have local storage capacity. The goal is to find the minimum cost delivery plan that ensures that none of the customers run out of product during the planning period. We investigate time-expanded network formulations that can form the basis of a Dynamic Discretization Discovery (DDD) algorithm and demonstrate in an extensive computational study that they, by themselves, produces provably high-quality, often optimal, solutions. In Chapter 4, we study the Continuous Time IRP with Out-and-Back Routes (CIRP-OB): a vehicle route starts at the depot, visits a single customer, and returns to the depot. We develop the full DDD algorithm to solve the CIRP-OB by using partially constructed time-expanded networks. This method iteratively discovers the time points needed in the network to find optimal solutions. We test this method on randomly generated instances with up to 30 customers, where provable optimal solutions are found in most cases.
Data-driven stochastic optimization approaches with applications in power systems

(Georgia Institute of Technology, 2019-07-26) Basciftci, Beste

In this thesis, we focus on data-driven stochastic optimization problems with an emphasis in power systems applications. On the one hand, we address the inefficiencies in maintenance and operations scheduling problems which emerge due to disregarding the uncertainties, and not utilizing statistical analysis methods. On the other hand, we develop a partially adaptive general purpose stochastic programming approach for effectively modeling and solving a class of problems in sequential decision-making.
Statistical inference, modeling, and learning of point processes

(Georgia Institute of Technology, 2019-07-24) Li, Shuang

Complex systems, such as healthcare systems, cities, and information networks, often produce a large volume of time series data, along with ordered event data, which are discrete in time and space, and rich in other features (e.g., markers or texts). We model the asynchronous event data as point processes. It is essential to understand and model the complex dynamics of these time series and event data so that accurate prediction, reliable detection, or smart intervention can be carried out for social goods. Specifically, my thesis focuses on the following aspects: (1) new statistical models and effective learning algorithms for complex dynamics exhibited in event data; (2) new inference algorithms for change-point detection, and temporal logic reasoning involving time series and event data. In Chapter 1, we propose a kernel-based nonparametric change-point detection method for high-dimensional streaming data. Change-point detection is an essential topic in modern complex systems. For example, wearable sensors are nowadays common in healthcare systems, which make it possible to monitor patients' health status in real time. Early event detection of deterioration is helpful and can even save patients' lives. However, it is challenging to aggregate measurements from different sensors to form one indicator, and it is not clear how to define pre- and post- change-point distributions. To tackle this problem, in Chapter 1, we propose a distribution-free and computationally efficient kernel-based nonparametric change-point detection method, which enjoys fewer assumptions on the distributions and can handle high-dimensional streaming data. Theoretical tail probability approximation of the nonparametric statistic is also proposed, which provides a statistically principled way to determine the detection thresholds. The proposed nonparametric method shows excellent performance on real human-activity detection dataset and speech dataset. In Chapter 2, we model networked asynchronous event data as point processes and propose a continuous-time change-point detection framework to detect dynamic changes in networks. We cast the problem into a sequential hypothesis test, and derive the generalized likelihood-ratio (GLR) statistic for networked point processes by considering the network topology. The constructed statistic can achieve weak signal detection by aggregating local statistics over time and networks. We further propose to evaluate the proposed GLR statistic via an efficient EM-like algorithm which can be implemented in a distributed fashion across dimensions. Similarly, we obtain a highly accurate theoretical threshold characterization for the proposed GLR statistic and demonstrate the excellent performance of our method on real social media datasets, such as Twitter and Memetracker. In Chapter 3, we propose an expressive model for the event data and further propose an adversarial learning framework to uncover the temporal dynamics. When modeling event data as point processes, instead of hand-crafting the occurrence intensity function by a parametric form, we leverage recent advances in deep learning and parameterize the intensity function as a recurrent neural network (RNN). RNN is a composition of a series of highly flexible nonlinear functions, which allows the model to capture complex dynamics in event data and make the generative process mimic the real data much better than the prior art. Fitting neural network models for even data is challenging. We develop a novel adversarial learning framework to address this challenge and further avoid model-misspecification. Our method provides a novel connection of such event data fitting method to inverse reinforcement learning, where a stochastic policy and the associated reward function are learned simultaneously. The proposed framework has been evaluated on real crime, social network, and healthcare datasets, and outperforms the state-of-the-art methods in data description. In Chapter 4, we propose a unified framework to integrate first-order temporal logic rules into point process models for event data. The proposed modeling framework excels in small data regime and has the ability to incorporate domain knowledge. The proposed temporal logic point processes model the intensity function of the event starts and ends via a set of first-order temporal logic rules. Using softened representation of temporal relations, and a weighted combination of logic rules, our framework can also deal with uncertainty in event data. Furthermore, many existing point process models can be interpreted as special cases of our framework given simple temporal logic rules. We derive a maximum likelihood estimation procedure for the proposed temporal logic point processes, and show that it can lead to accurate predictions when data are sparse and domain knowledge is critical. The proposed framework has been evaluated on real healthcare datasets, and outperforms the neural network models in event predication on small data and is easy to interpret.
Analytics approaches to improve strategic, operational, and clinical decision-making in healthcare

(Georgia Institute of Technology, 2019-05-29) Caglayan, Caglar

Healthcare and medicine both contain many complex and critical biological, clinical, and operational processes that cope with uncertainty, and require timely and effective decisions. Analytical methods such as mathematical modeling and computational optimization, offer a useful framework to study the complex problems that are observed in healthcare systems and medical processes. This thesis presents three important and complex medical decision-making problems Çağlar Çağlayan studied during his doctoral studies, describes the analytical methods he utilized and developed, and discusses the methodological and numerical findings and contributions of his work. The works presented in this thesis make contributions to three research topics on clinical decision-making under uncertainty: (i) the development of an optimal multi-modality screening program for women at high-risk for breast cancer, (ii) the determination of optimal physician staffing levels in emergency department under time-varying arrivals, and (iii) the study of the clinical course of follicular and diffuse large B cell lymphomas with the goal of improving treatment outcomes. In Chapter 1, we study a multi-modality breast cancer screening problem for high-risk population and identify optimal and cost-effective population screening strategies based on the imaging technologies that are in widespread use. Women with certain risk factors such as BRCA 1/2 gene mutations and family history of breast or ovarian cancer are at significantly higher risk for breast cancer. For these high-risk women, the existing guidelines recommend intensified screening starting at an early age, where the use of ultrasound (US) and magnetic resonance imaging (MRI) might address some of the limitations of mammography, the standard screening modality for average-risk women. Yet, the cost and false positive rates of MRI, and the operator dependency of US raise concerns. Currently, there is no consensus on the optimal use of these technologies in conjunction with, or instead of, mammography in high-risk women. To study this problem, we develop a Markov model to capture the disease incidence and progression in high-risk women, and formulate a mixed integer linear program to identify the optimal structured strategies that are practical for implementation. We further study the structure of the optimal strategies, and establish the conditions under which a strategy with more frequent but less sensitive screens yields higher health benefits than a strategy with more sensitive but less frequent screens. Our results show that (1) for young women, annual screening with ultrasound, is affordable with moderate budgets, and optimal over a wide range of budget levels despite its high operator dependency, (2) for middle-aged women, annual mammography screening is robustly optimal and cost-effective, and (3) the use of MRI, alone or combined with mammogram, leads to outcomes that are not cost-effective. In Chapter 2, we study a physician staffing and an associated patient routing problem in emergency rooms (ERs) coping with time-varying demand. ERs are complex healthcare delivery systems, characterized by time-varying unscheduled arrivals, medium-to-long service times, high patient volumes, multiple patient classes, and multiple treatment stages. In such a complex system, optimizing the staffing levels of physicians, the most critical resources in ERs, is a major challenge. In this work, we study a staffing problem for ER physicians, and propose a new staffing algorithm that determines the optimum staffing levels stabilizing differentiated tail probability of delay (TPoD) type service targets. Taking a queuing theory approach, we develop a practical and intuitive multi-class multi-stage queuing network describing the ER care delivery as sequences of treatments and order bundles (i.e., groups of diagnostic medical processes). Employing this model, we capture time-varying patient flow in the ER and estimate its load on treatment stations, served by physicians. Treatment queues operate in efficiency-driven regime but experience negligible abandonment as abandonments nearly always occur at earlier stages of the ER care. This observation motivates our proposed new staffing algorithm, which translates the offered load into staffing decisions for efficiency driven queues with perfectly patient customers and TPoD type targets. We analytically show the asymptotic effectiveness of our staffing algorithm for M/M/s queues that operate in efficiency-driven mode. Then, we demonstrate its robustness via realistic and data-driven simulation experiments in various time-varying ER settings, considering non-homogeneous Poisson arrivals, multiple patient classes, multi-stage service, and centralized (pooled) (physicians) under several practical routing rules. Our results show that (1) our proposed staffing approach is effective and robust for optimizing the ER physician staffing levels in various ER settings, and (2) as the service complexity of an ER increases, the use of dynamic rules, using the current system state for routing decisions, and hybrid policies, combining pre-determined static routing rules with dynamic ones, become necessary to stabilize TPoD targets. In Chapter 3, we study the clinical course of two types of lymph node cancers, follicular lymphoma (FL) and diffuse large B cell lymphoma (DLBCL). These cancers have different characteristics, where DLBCL is aggressive and FL is recurrent, and have multiple clinical intermediate- or end-points such as the sequence of treatments or cause-specific death. Accordingly, we develop two different continuous-time, multi-state survival analysis models to investigate the clinical course of these diseases following initial treatment with the goal of improving treatment outcomes. We utilize Cox proportional hazards models to specify the impact of prognostic factors on overall survival and cause-specific deaths, and the Aalen-Johansen estimator to project the course of DLBCL over time. In particular, employing the multi-state FL model, we investigate the clinical course of FL under first, second and third line therapies for high-risk patients to assess the effectiveness of various treatment sequences. Our analysis shows that single R-CHOP therapy in any line of treatment improves overall survival for high-risk patients, achieving the most favorable outcome when provided as first-line therapy, but its multiple use for first- and second-line might lead to adverse outcomes. Using the DLBCL model, we examine the role of clinical and socio-demographic factors on DLBCL-associated mortality in the elderly population and identify a cutoff point to stop monitoring DLBCL patients receiving the standard R- CHOP therapy. Utilizing a large population-based dataset, our analysis (1) identifies age, sex, and Charlson comorbidity index as risk factors for DLBCL-specific and other causes of death, and (2) confirms a 5-year cure point for older patients receiving R-CHOP therapy, suggesting to transition survivorship surveillance plans from a focus on lymphoma recurrence-related deaths to non-cancer risks at five years after treatment. In Chapter 4, we summarize our studies, list our contributions, and conclude the thesis.
Statistical learning with regularizations: Theory and applications

(Georgia Institute of Technology, 2019-05-21) Cao, Shanshan

This thesis contributes to the area of statistical learning with regularization and applications, which has been popular for sparse estimation and function estimation in many areas such as signal/image processing, statistics, bioinformatics and machine learning. Our study helps (i) unify the high-dimensional sparse estimation with non-convex penalty; (ii) prove the asymptotical optimality of high-order Laplacian regularization in function estimation; (iii) improve the performance of the composite fuselage assembly process by using sparsity penalized $\ell_\infty$ based linear model; (iv) identify the census tracts where children have limited access to preventive dental care. In this thesis, we have four main works. In Chapter 1, under the linear regression framework, we study the variable selection problem when the underlying model is assumed to have a small number of nonzero coefficients (i.e., the underlying linear model is sparse). We propose to use the difference-of-convex (DC) functions to unify the non-convex penalties in the literature for sparse estimation. Under the DC framework, directional-stationary (d-stationary) solutions are considered, and they are usually not unique. In this chapter, we show that under some mild conditions, a certain subset of d-stationary solutions in an optimization problem (with a DC objective) has some ideal statistical properties: namely, asymptotic estimation consistency, asymptotic model selection consistency, asymptotic efficiency. This work shows that DC is a nice framework to offer a unified approach to these existing work where non-convex penalty is involved. Our work bridges the communities of optimization and statistics. In Chapter 2, we propose a function estimation method using the high-order Laplacian regularization. Graph Laplacian based regularization has been widely used in learning problems to take advantage of the information on the geometry towards the marginal distribution. In this chapter, we consider the high-order Laplacian regularization, whose empirical (i.e., sample) version takes the form of ${\bf f}^T {\bf L}^m {\bf f}$ with ${\bf L}$ being the graph Laplacian matrix of the sample data, and provide the theoretical foundations in the non-parametric setting. We show that nearly all good asymptotic properties of the existing state-of-the-art approaches are inherited by the Laplacian-based smoother. Specifically, we prove that as the sample size goes to infinity, the expected mean squared errors (MSE) is of order $O(n^{-\frac{2m}{2m+d}})$, which is the {\it optimal convergence rate} in a setting of nonparametric estimation by Stone (1982), where $m$ is the order of the Sobolev semi-norm used in the regularization, and $d$ is the intrinsic dimension of the domain. Besides, we propose a {\it generalized cross validation} (GCV) approach to choose the penalty parameter $\lambda$, and we establish its {\it asymptotical optimality} guarantee. In Chapter 3, we study the fuselage assembly problem using sparse learning theories. Natural dimensional variabilities of incoming fuselages affect the assembly speed and quality of fuselage joins in composite fuselage assembly process. Thus, shape control is critical to ensure the quality of composite fuselage assembly. In practice, the maximum gap between the two fuselages plays the key role for assembly. In this work, we consider the $\ell_\infty$ based linear regression, which is lack of study in statistics but critical for optimal shape control in fuselage assembly. We mainly study the $\ell_\infty$ model under the framework of high-dimensional sparse estimation, where we use the $\ell_1$ penalty to control the sparsity of the resulting estimator. Estimation error of the $\ell_1$ regularized $\ell_\infty$ linear model is derived, which meets the upper-bound in the exiting literature. Finally, we use numerical studies for fuselage control to verify the advantages of $\ell_\infty$ based regression. In Chapter 4, we compared access to preventive dental care for low-income children eligible for public dental insurance to children with private dental insurance and/or high family income ($>$400\% of the federal poverty level) in Georgia and the impact of policies towards increasing access to dental care for low-income children. Specifically, we used multiple sources of data (e.g., US Census, Georgia Board of Dentistry) to estimate measures of preventive care access in 2015 for children, aged 0 to 18 years. Measures included met need, scarcity of dentists, and one-way travel distance to a dentist at the census tract level. We used an optimization model to estimate access, quantify disparities and evaluate policies. We find that about 1.5 million children were eligible for public insurance, and 600,000 had private insurance and/or high family income. Across census tracts, average met need was 59\% for low-income children and 96\% for the high-income children; for rural census tracts, these values were 33\% and 84\%, respectively. The average travel distance for all census tracts was 3.71 miles for high-income/insured children and 17.16 miles for low-income children; for rural census tracts, these values were 11.55 and 32.91 miles, respectively. Met need significantly increased and travel distance decreased for modest increases in provider acceptance of Medicaid eligible children. In order to achieve 100\% met need, 80\% provider participation rate would be required. We conclude that across census tracts, high-income children had notably higher access than low-income children. Identifying these tracts could result in more efficient allocation of public health dental resources.
Clustering and feature detection methods for high-dimensional data

(Georgia Institute of Technology, 2019-05-21) Lahoti, Geet

The majority of the real-world data are unlabeled. Moreover, complex characteristics such as high-dimensionality and high variety pose significant analytical challenges. In statistical and machine learning, supervised and unsupervised methods are used to analyze labeled and unlabeled data, respectively. Compared to supervised learning methods, unsupervised learning is less developed. Therefore, this dissertation focuses on developing unsupervised methods to perform clustering and feature detection tasks in real-world high-dimensional data settings. Specifically, we develop methods to cluster censored spatio-temporal data, detect pixel-level features in medical imaging data, and adaptively detect anomalies in industrial optical inspection images and candidates’ emotions in interview videos. The overarching objective of these methods is to help stakeholders improve the performance of the associated systems in terms of user engagement, patient comfort, customer satisfaction, and product quality.