Title:
New progress in hot-spots detection, partial-differential-equation-based model identification and statistical computation

dc.contributor.advisor Mei, Yajun
dc.contributor.advisor Huo, Xiaoming
dc.contributor.author Zhao, Yujie
dc.contributor.committeeMember Shi, Jianjun
dc.contributor.committeeMember Zhou, Haomin
dc.contributor.committeeMember Holte, Sarah
dc.contributor.department Industrial and Systems Engineering
dc.date.accessioned 2021-06-10T16:53:06Z
dc.date.available 2021-06-10T16:53:06Z
dc.date.created 2021-05
dc.date.issued 2021-04-19
dc.date.submitted May 2021
dc.date.updated 2021-06-10T16:53:06Z
dc.description.abstract This thesis discusses the new progress in (1) hot-spots detection in spatial-temporal data, (2) partial-differential-equation-based (PDE-based) model identification, and (3) optimization in the Least Absolute Shrinkage and Selection Operator (Lasso) type problem. In this thesis, we have four main works. Chapter 1 and Chapter 2 fall in the first area, i.e., hot-spots detection in spatio-temporal data. Chapter 3 belongs to the second area, i.e., PDE-based model identification. Chapter 4 is for the third area, i.e., optimization in the Lasso-type problem. The detailed description of these four chapters is summarized as follows. In Chapter 1, we aim at detecting hot-spots in multivariate spatio-temporal dataset that are non-stationary over time. To realize this objective, we propose a statistical method to under the framework of tensor decomposition and our method has three steps. First, we fit the observed data into a Smooth Sparse Decomposition Tensor (SSD-Tensor) model that serves as a dimension reduction and de-noising technique: it is an additive model that decomposes the original data into three components: smooth but non-stationary global mean, sparse local anomalies, and random noises. Next, we estimate the model parameters by the penalized framework that includes a combination of Lasso and fused Lasso penalty to address the spatial sparsity and temporal consistency, respectively. Finally, we apply a Cumulative Sum (CUSUM) Control Chart to monitor the model residuals, which allows us to detect when and where the hot-spot event occurs. To demonstrate the usefulness of our proposed SSD-Tensor method, we compare it with several other methods in extensive numerical simulation studies and a real crime rate dataset. The material of this chapter is published in Journal of Applied Statistics in January, 2021 under the title ``Rapid Detection of Hot-spots via Tensor Decomposition with Applications to Crime Rate Data'' with co-authors Hao Yan, Sarah E. Holte and Yajun Mei. In Chapter 2, we improve the methodology in Chapter 1 both statistically and computationally. The statistical improvement is the new methodologies to detect hot-spots with temporal circularity, instead of temporal continuity as in Chapter 1. This helps us handle many bio-surveillance and healthcare applications, where data sources are measured from many spatial locations repeatedly over time, say, daily/weekly/monthly. The computational improvement is the development of a more efficient algorithm. The main tool we use to accelerate the calculation is the tensor decomposition, which is similar to the matrix context where it might be difficult to compute the inverse of a large matrix in general, but it will be straightforward to calculate the inverse of a large block diagonal matrix through the inverse of sub-matrices in the diagonal. The usefulness of the improved methodology is validated through numerical simulations and a real-world dataset in the weekly number of gonorrhea cases from 2006 to 2018 for 50 states in U.S.. The material of this chapter is accepted as a book chapter in Frontiers in Statistical Quality Control 13 in February 2021 under the title ``Rapid Detection of Hot-spot by Tensor Decomposition with Application to Weekly Gonorrhea Data'' with co-authors Hao Yan, Sarah E. Holte, Roxanne P. Kerani and Yajun Mei. In Chapter 3, we propose a two-stage method called Spline Assisted Partial Differential Equation involved Model Identification (SAPDEMI) method to efficiently identify the underlying PDE models from the noisy data. In the first stage -- functional estimation stage -- we employ the cubic spline to estimate the unobservable derivatives, which serve as candidates of the underlying PDE models. The contribution of this stage is that, it is computational efficient because it only requires the computational complexity of the linear polynomial of the sample size, which achieves the lowest possible order of complexity. In the second stage -- model identification stage -- we apply Lasso to identify the underlying PDE model. The contribution of this stage is that, we focus on the model selections, while the existing literature mostly focuses on parameter estimations. Moreover, we develop statistical properties of our method for correct identification, where the main tool we use is the primal-dual witness (PDW) method. Finally, we validate our theory through various numerical examples. In Chapter 4, we focus on developing an algorithm to solve the optimization with a L1 regularization term, namely the Lasso-type problem. The algorithm developed in this chapter can greatly reduce the computational complexity in Chapter 1, Chapter 2 and Chapter 3, where we try to realize sparse identification. The challenge to develop an efficient algorithm for the Lasso-type problem is that the objective function of the Lasso-type problem is not strictly convex when the number of samples is less than the number of features. This special property of the Lasso-problem leads the existing Lasso-type estimator, in general, cannot achieve the optimal rate due to the undesirable behavior of the absolute function at the origin. To overcome the above challenge, we develop a homotopic method, where we use a sequence of surrogate functions to approximate the L1 penalty that is used in the Lasso-type of estimators. The surrogate functions will converge to the L1 penalty in the Lasso estimator. At the same time, each surrogate function is strictly convex, which enables a provable faster numerical rate of convergence. In this chapter, we demonstrate that by meticulously defining the surrogate functions, one can prove a faster numerical convergence rate than any existing methods in computing for the Lasso-type of estimators. Namely, the state-of-the-art algorithms can only guarantee O(1/\epsilon) or O(1/\sqrt{\epsilon}) convergence rates, while we can prove an O([\log(1/\epsilon)]^2) for the newly proposed algorithm. Our numerical simulations show that the new algorithm also performs better empirically.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/64740
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject hot-spots detection
dc.subject tensor decomposition
dc.subject spatio-temporal
dc.subject CUSUM
dc.subject circular time
dc.subject partial differential equation
dc.subject model identification
dc.subject cubic spline
dc.subject Lasso
dc.subject homotopic method
dc.subject convergence rate
dc.subject L1 regularization
dc.title New progress in hot-spots detection, partial-differential-equation-based model identification and statistical computation
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Mei, Yajun
local.contributor.advisor Huo, Xiaoming
local.contributor.corporatename H. Milton Stewart School of Industrial and Systems Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication 278b2355-ca85-4111-b664-4d7e39f71482
relation.isAdvisorOfPublication e04d5ccf-23db-4c60-a862-62d4602af9da
relation.isOrgUnitOfPublication 29ad75f0-242d-49a7-9b3d-0ac88893323c
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
ZHAO-DISSERTATION-2021.pdf
Size:
5.45 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: