Organizational Unit:
H. Milton Stewart School of Industrial and Systems Engineering

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 21
  • Item
    Using machine learning to estimate survival curves for transplarnt patients receiving an increased risk for disease transmission donor organ versus waiting for a standard organ
    (Georgia Institute of Technology, 2019-03-26) Mark, Ethan Joshua
    In 1994, the Centers for Disease Control and Prevention (CDC) and the Public Health Service (PHS) released guidelines classifying donors at risk of transmitting human immunodeficiency virus (HIV) through organ transplantation. In 2013, the guidelines were updated to include donors at risk of transmitting hepatitis B (HBV) and hepatitis C (HCV). These donors are known as increased risk for disease transmission donors (IRD). Even though donors are now universally screened for HIV, HBV, and HCV by nucleic acid testing (NAT), NAT can be negative during the eclipse phase, when the virus is not detectable in blood. In part due to the opioid epidemic, over 19% of organ donors were classified as IRD in 2014. Despite the risks of disease transmission and associated mortality from accepting an IRD organ offer, patients also face mortality risks if they decline the organ and wait for a non-IRD organ. The main theme of this thesis is to build organ transplant and waitlist survival models and to help patients decide between accepting an IRD organ offer or remaining on the waitlist for a non-IRD organ. In chapter one, we introduced background information and the outline of the thesis. In chapter two, we used machine learning to build an organ transplant survival model for the kidney that achieves greater performance than the model currently being used in the U.S. kidney allocation system. In chapter three, we used similar modeling techniques and simulation to compare the survival for patients accepting IRD kidney offers vs. waiting for non-IRD kidneys. We then extend our IRD vs. non-IRD survival comparisons to the liver, heart and lung in chapter four, using different models and parameters. In chapter five, we built a model that predicts how the health of a patient changes from waitlist registration to transplantation. In chapter six, we utilized the transplant and waitlist survival models built in chapters three and four to create an interactive tool that displays the survival curves for a patient receiving an IRD organ or waiting for a non-IRD organ. The tool can also show the survival curve if a patient chooses to receive a non-IRD organ immediately. We then concluded with a discussion and major takeaways in chapter seven.
  • Item
    Sequential interval estimation for Bernoulli trials
    (Georgia Institute of Technology, 2018-07-31) Yaacoub, Tony
    Interval estimation of a binomial proportion is one of the most-basic problems in statistics with many important real-world applications. Some classical applications include estimation of the prevalence of a rare disease and accuracy assessment in remote sensing. In these applications, the sample size is fixed beforehand, and a confidence interval for the proportion is obtained. However, in many modern applications, sampling is especially costly and time consuming, e.g., estimating the customer click-through probability in online marketing campaigns and estimating the probability that a stochastic system satisfies a specific property as in Statistical Model Checking. Because these applications tend to require extensive time and cost, it is advantageous to reduce the sample size while simultaneously assuring satisfactory quality (coverage) levels for the corresponding interval estimates. The sequential version of the interval estimation aims at the latter goal by allowing the sample size to be random and, in particular, formulating a stopping time controlled by the observations themselves. The literature focusing on the sequential setup of the problem is limited compared to its fixed sample-size counterpart, and sampling procedure optimality has not been established in the literature. The work in this thesis aims to extend the body of knowledge on the topic of sequential interval estimation for Bernoulli trials, addressing both the theoretical and practical concerns. In the first part of this thesis, we propose an optimal sequential methodology for obtaining fixed-width confidence intervals for a binomial proportion when prior knowledge of the proportion is available. We assume that there exists a prior distribution for the binomial proportion, and our goal is to minimize the expected number of samples while guaranteeing that the coverage probability is at least a specified nominal coverage probability level. We demonstrate our stopping time is always bounded from above and below; we will need to first accumulate a sufficient amount of information before we start applying our stopping rule, and our stopping time will always terminate in finite time. We also compare our method with the optimum fixed-sample-size procedure as well as with existing alternative sequential schemes. In the second part of this thesis, we propose a two-stage sequential method for obtaining tandem-width confidence intervals for a binomial proportion when no prior knowledge of the proportion is available and when it is desired to have a computationally efficient method. By tandem-width, we mean that the half-width of the confidence interval of the proportion is not fixed beforehand; it is instead required to satisfy two different upper bounds depending on the underlying value of the binomial proportion. To tackle this problem, we propose a simple but useful sequential method for obtaining fixed-width confidence intervals for the binomial proportion based on the minimax estimator of the binomial proportion. Finally, we extend the idea for Bernoulli distributions in the first part of this thesis to interval estimation for arbitrary distributions, with an alternative optimality formulation. Here, we propose a conditional cost alternative formulation to circumvent certain analytical/computational difficulties. Specifically, we assume that an independent and identically distributed process is observed sequentially with its common probability density function having a random parameter that must be estimated. We follow a semi-Bayesian approach where we assign cost to the pair (estimator, true parameter), and our goal is to minimize the average sample size guaranteeing at the same time an average cost below some prescribed level. For a variety of examples, we compare our method with the optimum fixed-sample-size and other existing sequential schemes.
  • Item
    Topics in the statistical aspects of simulation
    (Georgia Institute of Technology, 2015-08-19) McDonald, Joshua L.
    We apply various variance reduction techniques to the estimation of Asian averages and options and propose an easy-to-use quasi-Monte Carlo method that can provide significant variance reductions with minimal increases in computational time. We have also extended these techniques to estimate higher moments of the Asians. We then use these estimated moments to efficiently implement Gram--Charlier based estimators for probability density functions of Asian averages and options. Finally, we investigate a ranking and selection application that uses post hoc analysis to determine how the circumstances of procedure termination affect the probability of correct selection.
  • Item
    Statistical estimation and changepoint detection methods in public health surveillance
    (Georgia Institute of Technology, 2015-04-07) Reynolds, Sue Bath
    This thesis focuses on assessing and improving statistical methods implemented in two areas of public health research. The first topic involves estimation of national influenza-associated mortality rates via mathematical modeling. The second topic involves the timely detection of infectious disease outbreaks using statistical process control monitoring. For over fifty years, the Centers for Disease Control and Prevention has been estimating annual rates of U.S. deaths attributable to influenza. These estimates have been used to determine costs and benefits associated with influenza prevention and control strategies. Quantifying the effect of influenza on mortality, however, can be challenging since influenza infections typically are not confirmed virologically nor specified on death certificates. Consequently, a wide range of ecologically based, mathematical modeling approaches have been applied to specify the association between influenza and mortality. To date, all influenza-associated death estimates have been based on mortality data first aggregated at the national level and then modeled. Unfortunately, there are a number of local-level seasonal factors that may confound the association between influenza and mortality - thus suggesting that data be modeled at the local level and then pooled to make national estimates of death. The first component of the thesis topic involving mortality estimation addresses this issue by introducing and implementing a two-stage hierarchical Bayesian modeling approach. In the first stage, city-level data with varying trends in mortality and weather were modeled using semi-parametric, generalized additive models. In the second stage, the log-relative risk estimates calculated for each city in stage 1 represented the “outcome” variable, and were modeled two ways: (1) assuming spatial independence across cities using a Bayesian generalized linear model, and (2) assuming correlation among cities using a Bayesian spatial correlation model. Results from these models were compared to those from a more-conventional approach. The second component of this topic examines the extent to which seasonal confounding and collinearity affect the relationship between influenza and mortality at the local (city) level. Disentangling the effects of temperature, humidity, and other seasonal confounders on the association between influenza and mortality is challenging since these covariates are often temporally collinear with influenza activity. Three modeling strategies with varying representations of background seasonality were compared. Seasonal covariates entered into the model may have been measured (e.g., ambient temperature) or unmeasured (e.g., time-based smoothing splines or Fourier terms). An advantage of modeling background seasonality via time splines is that the amount of seasonal curvature can be controlled by the number of degrees of freedom specified for the spline. A comparison of the effects of influenza activity on mortality based on these varying representations of seasonal confounding is assessed. The third component of this topic explores the relationship between mortality rates and influenza activity using a flexible, natural cubic spline function to model the influenza term. The conventional approach of fitting influenza-activity terms linearly in regression was found to be too constraining. Results show that the association is best represented nonlinearly. The second area of focus in this thesis involves infectious disease outbreak detection. A fundamental goal of public health surveillance, particularly syndromic surveillance, is the timely detection of increases in the rate of unusual events. In syndromic surveillance, a significant increase in the incidence of monitored disease outcomes would trigger an alert, possibly prompting the implementation of an intervention strategy. Public health surveillance generally monitors count data (e.g., counts of influenza-like illness, sales of over-the-counter remedies, and number of visits to outpatient clinics). Statistical process control charts, designed for quality control monitoring in industry, have been widely adapted for use in disease and syndromic surveillance. The behavior of these detection methods on discrete distributions, however, has not been explored in detail. For this component of the thesis, a simulation study was conducted to compare the CuSum and EWMA methods for detection of increases in negative binomial rates with varying amounts of dispersion. The goal of each method is to detect an increase in the mean number of cases as soon as possible after an upward rate shift has occurred. The performance of the CuSum and EWMA detection methods is evaluated using the conditional expected delay criterion, which is a measure of the detection delay, i.e., the time between the occurrence of a shift and when that shift is detected. Detection capabilities were explored under varying shift sizes and times at which the shifts occurred.
  • Item
    A modeling framework for analyzing the education system as a complex system
    (Georgia Institute of Technology, 2015-04-01) Mital, Pratik
    In this thesis, the Education System Intervention Modeling Framework (ESIM Framework) is introduced for analyzing interventions in the K-12 education system. This framework is the first of its kind to model interventions in the K-12 school system in the United States. Techniques from systems engineering and operations research, such as agent-based modeling and social network analysis, are used to model the bottom-up mechanisms of intervention implementation in schools. By applying the ESIM framework, an intervention can be better analyzed in terms of the barriers and enablers to intervention implementation and sustainability. The risk of failure of future interventions is thereby reduced through improved allocation of resources towards the system agents and attributes which play key roles in the sustainability of the intervention. Increasing the sustainability of interventions in the school system improves educational outcomes in the school and increases the benefits gained from the millions of dollars being invested in such interventions. In the first part of this thesis, a case study of an Engineers Without Borders chapter is modeled which helped in the development of a more generalized framework, applicable across a broad range of education system interventions. In the second part of this thesis, the ESIM framework is developed. The framework developed is divided into four phases: model definition, model design, model analysis, and model validation. Each of these phases has detailed steps in order to build the agent-based model of the particular intervention. In the third part of this thesis, the ESIM framework is applied to a case study of a curriculum intervention, Science Learning: Integrating Design, Engineering and Robotics, involving the design and implementation of an 8th-grade, inquiry-based physical science curriculum across three demographically varying schools. This case study provides a good comparison of the implementation of the intervention across different school settings because of the varied outcomes at the three schools.
  • Item
    Sequential estimation in statistics and steady-state simulation
    (Georgia Institute of Technology, 2014-04-09) Tang, Peng
    At the onset of the "Big Data" age, we are faced with ubiquitous data in various forms and with various characteristics, such as noise, high dimensionality, autocorrelation, and so on. The question of how to obtain accurate and computationally efficient estimates from such data is one that has stoked the interest of many researchers. This dissertation mainly concentrates on two general problem areas: inference for high-dimensional and noisy data, and estimation of the steady-state mean for univariate data generated by computer simulation experiments. We develop and evaluate three separate sequential algorithms for the two topics. One major advantage of sequential algorithms is that they allow for careful experimental adjustments as sampling proceeds. Unlike one-step sampling plans, sequential algorithms adapt to different situations arising from the ongoing sampling; this makes these procedures efficacious as problems become more complicated and more-delicate requirements need to be satisfied. We will elaborate on each research topic in the following discussion. Concerning the first topic, our goal is to develop a robust graphical model for noisy data in a high-dimensional setting. Under a Gaussian distributional assumption, the estimation of undirected Gaussian graphs is equivalent to the estimation of inverse covariance matrices. Particular interest has focused upon estimating a sparse inverse covariance matrix to reveal insight on the data as suggested by the principle of parsimony. For estimation with high-dimensional data, the influence of anomalous observations becomes severe as the dimensionality increases. To address this problem, we propose a robust estimation procedure for the Gaussian graphical model based on the Integrated Squared Error (ISE) criterion. The robustness result is obtained by using ISE as a nonparametric criterion for seeking the largest portion of the data that "matches" the model. Moreover, an l₁-type regularization is applied to encourage sparse estimation. To address the non-convexity of the objective function, we develop a sequential algorithm in the spirit of a majorization-minimization scheme. We summarize the results of Monte Carlo experiments supporting the conclusion that our estimator of the inverse covariance matrix converges weakly (i.e., in probability) to the latter matrix as the sample size grows large. The performance of the proposed method is compared with that of several existing approaches through numerical simulations. We further demonstrate the strength of our method with applications in genetic network inference and financial portfolio optimization. The second topic consists of two parts, and both concern the computation of point and confidence interval (CI) estimators for the mean µ of a stationary discrete-time univariate stochastic process X \equiv \{X_i: i=1,2,...} generated by a simulation experiment. The point estimation is relatively easy when the underlying system starts in steady state; but the traditional way of calculating CIs usually fails since the data encountered in simulation output are typically serially correlated. We propose two distinct sequential procedures that each yield a CI for µ with user-specified reliability and absolute or relative precision. The first sequential procedure is based on variance estimators computed from standardized time series applied to nonoverlapping batches of observations, and it is characterized by its simplicity relative to methods based on batch means and its ability to deliver CIs for the variance parameter of the output process (i.e., the sum of covariances at all lags). The second procedure is the first sequential algorithm that uses overlapping variance estimators to construct asymptotically valid CI estimators for the steady-state mean based on standardized time series. The advantage of this procedure is that compared with other popular procedures for steady-state simulation analysis, the second procedure yields significant reduction both in the variability of its CI estimator and in the sample size needed to satisfy the precision requirement. The effectiveness of both procedures is evaluated via comparisons with state-of-the-art methods based on batch means under a series of experimental settings: the M/M/1 waiting-time process with 90% traffic intensity; the M/H_2/1 waiting-time process with 80% traffic intensity; the M/M/1/LIFO waiting-time process with 80% traffic intensity; and an AR(1)-to-Pareto (ARTOP) process. We find that the new procedures perform comparatively well in terms of their average required sample sizes as well as the coverage and average half-length of their delivered CIs.
  • Item
    Network modeling of sexually transmitted diseases
    (Georgia Institute of Technology, 2014-04-04) Chen, Yao-Hsuan
    We create a dynamic network model to replicate more closely the population network structures of interest. Network, Norms and HIV/STI Risk Among Youth (NNAHRAY) is a community relationship survey data set, which provides a rare sample of a human risky-behavior contact network. Combining disease compartmental models with our dynamic network model, we simulate the spread of Human Immunodeficiency Virus (HIV) and Herpes Simplex Type 2 Virus (HSV2) with consideration of HSV2's synergistic impact on HIV's transmission. Our model reproduces HIV prevalence, HSV-2 prevalence, and the contact network close to those observed in NNAHRAY, with HIV annual prevalence closer to the estimated values from the literature than those of any disease spread model based on static networks. The success of fitting our model to the target data shows the importance of considering the data sampling process, contact dynamics, and contact network structures. Our model, under certain conditions, has prevalence prediction results that are insensitive to changes in network size. The analysis of various prevention/intervention strategies targeting different risky groups gives important insights into strategy prioritization and illustrates how our model can be used to assist in making public health policy decisions in practice, both for individual diseases and in the more-recent area of study that considers synergy between two diseases.
  • Item
    Bio-surveillance: detection and mitigation of disease outbreak
    (Georgia Institute of Technology, 2013-10-23) Lee, Mi Lim
    In spite of the remarkable development of modern medical treatment and technology, the threat of pandemic diseases such as anthrax, cholera, and SARS has not disappeared. As a part of emerging healthcare decision problems, many researchers have studied how to detect and contain disease outbreaks, and our research is aligned with this trend. This thesis mainly consists of two parts: epidemic simulation modeling for effective intervention strategies and spatiotemporal monitoring for outbreak detection. We developed a stochastic epidemic simulation model of a pandemic influenza virus (H1N1) to test possible interventions within a structured population. The possible interventions — such as vaccination, antiviral treatment, household prophylaxis, school closure and social distancing — are investigated in a large number of scenarios, including delays in vaccine delivery and low and moderate efficacy of the vaccine. Since timely and accurate detection of a disease outbreak is crucial in terms of preparation for emergencies in healthcare and biosurveillance, we suggest two spatiotemporal monitoring charts, namely, the SMCUSUM and RMCUSUM charts, to detect increases in the rate or count of disease incidents. Our research includes convenient methods to approximate the control limits of the charts. An analytical control limit approximation method for the SMCUSUM chart performs well under certain conditions on the data distribution and monitoring range. Another control limit approximation method for the RMCUSUM chart provides robust performance to various monitoring range, spatial correlation structures, and data distributions without intensive modeling of the underlying process.
  • Item
    Optimal randomized and non-randomized procedures for multinomial selection problems
    (Georgia Institute of Technology, 2012-03-20) Tollefson, Eric Sander
    Multinomial selection problem procedures are ranking and selection techniques that aim to select the best (most probable) alternative based upon a sequence of multinomial observations. The classical formulation of the procedure design problem is to find a decision rule for terminating sampling. The decision rule should minimize the expected number of observations taken while achieving a specified indifference zone requirement on the prior probability of making a correct selection when the alternative configurations are in a particular subset of the probability space called the preference zone. We study the constrained version of the design problem in which there is a given maximum number of allowed observations. Numerous procedures have been proposed over the past 50 years, all of them suboptimal. In this thesis, we find via linear programming the optimal selection procedure for any given probability configuration. The optimal procedure turns out to be necessarily randomized in many cases. We also find via mixed integer programming the optimal non-randomized procedure. We demonstrate the performance of the methodology on a number of examples. We then reformulate the mathematical programs to make them more efficient to implement, thereby significantly expanding the range of computationally feasible problems. We prove that there exists an optimal policy which has at most one randomized decision point and we develop a procedure for finding such a policy. We also extend our formulation to replicate existing procedures. Next, we show that there is very little difference between the relative performances of the optimal randomized and non-randomized procedures. Additionally, we compare existing procedures using the optimal procedure as a benchmark, and produce updated tables for a number of those procedures. Then, we develop a methodology that guarantees the optimal randomized and non-randomized procedures for a broad class of variable observation cost functions -- the first of its kind. We examine procedure performance under a variety of cost functions, demonstrating that incorrect assumptions regarding marginal observation costs may lead to increased total costs. Finally, we investigate and challenge key assumptions concerning the indifference zone parameter and the conditional probability of correct selection, revealing some interesting implications.
  • Item
    Advances in ranking and selection: variance estimation and constraints
    (Georgia Institute of Technology, 2010-07-16) Healey, Christopher M.
    In this thesis, we first show that the performance of ranking and selection (R&S) procedures in steady-state simulations depends highly on the quality of the variance estimates that are used. We study the performance of R&S procedures using three variance estimators --- overlapping area, overlapping Cramer--von Mises, and overlapping modified jackknifed Durbin--Watson estimators --- that show better long-run performance than other estimators previously used in conjunction with R&S procedures for steady-state simulations. We devote additional study to the development of the new overlapping modified jackknifed Durbin--Watson estimator and demonstrate some of its useful properties. Next, we consider the problem of finding the best simulated system under a primary performance measure, while also satisfying stochastic constraints on secondary performance measures, known as constrained ranking and selection. We first present a new framework that allows certain systems to become dormant, halting sampling for those systems as the procedure continues. We also develop general procedures for constrained R&S that guarantee a nominal probability of correct selection, under any number of constraints and correlation across systems. In addition, we address new topics critical to efficiency of the these procedures, namely the allocation of error between feasibility check and selection, the use of common random numbers, and the cost of switching between simulated systems.