Title:
High-dimensional classification and attribute-based forecasting

dc.contributor.advisor Tsui, Kwok-Leung
dc.contributor.advisor Hung, Ying
dc.contributor.author Lo, Shin-Lian en_US
dc.contributor.committeeMember Abayomi, Kobi A.
dc.contributor.committeeMember Goldsman, David
dc.contributor.committeeMember Yuan, Ming
dc.contributor.department Industrial and Systems Engineering en_US
dc.date.accessioned 2011-03-04T20:21:52Z
dc.date.available 2011-03-04T20:21:52Z
dc.date.issued 2010-08-27 en_US
dc.description.abstract This thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors. Classification problems in microarray experiments refer to discriminating subjects with different biologic phenotypes or known tumor subtypes as well as to predicting the clinical outcomes or the prognostic stages of subjects. One important characteristic of microarray data is that the number of genes is much larger than the sample size. The penalized logistic regression method is known for simultaneous variable selection and classification. However, the performance of this method declines as the number of variables increases. With this concern, in the first study, we propose a new classification approach that employs the penalized logistic regression method iteratively with a controlled size of gene subsets to maintain variable selection consistency and classification accuracy. The second study is motivated by a modern microarray experiment that includes two layers of replicates. This new experimental setting causes most existing classification methods, including penalized logistic regression, not appropriate to be directly applied because the assumption of independent observations is violated. To solve this problem, we propose a new classification method by incorporating random effects into penalized logistic regression such that the heterogeneity among different experimental subjects and the correlations from repeated measurements can be taken into account. An efficient hybrid algorithm is introduced to tackle computational challenges in estimation and integration. Applications to a breast cancer study show that the proposed classification method obtains smaller models with higher prediction accuracy than the method based on the assumption of independent observations. The second part of this thesis develops a new forecasting approach for large-scale datasets associated with a large number of predictor categories and with predictor structures. The new approach, beyond conventional tree-based methods, incorporates a general linear model and hierarchical splits to make trees more comprehensive, efficient, and interpretable. Through an empirical study in the air cargo industry and a simulation study containing several different settings, the new approach produces higher forecasting accuracy and higher computational efficiency than existing tree-based methods. en_US
dc.description.degree Ph.D. en_US
dc.identifier.uri http://hdl.handle.net/1853/37193
dc.publisher Georgia Institute of Technology en_US
dc.subject Classification en_US
dc.subject Microarray experiments en_US
dc.subject Tree-based methods en_US
dc.subject Variable selection en_US
dc.subject Penalized logistic regression en_US
dc.subject Forecasting en_US
dc.subject.lcsh Computational biology
dc.subject.lcsh Bioinformatics
dc.subject.lcsh Pattern recognition systems
dc.subject.lcsh DNA microarrays
dc.subject.lcsh Classification
dc.subject.lcsh Logistic regression analysis
dc.title High-dimensional classification and attribute-based forecasting en_US
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.corporatename H. Milton Stewart School of Industrial and Systems Engineering
local.contributor.corporatename College of Engineering
relation.isOrgUnitOfPublication 29ad75f0-242d-49a7-9b3d-0ac88893323c
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
lo_shinlian_201012_phd.pdf
Size:
1.18 MB
Format:
Adobe Portable Document Format
Description: