Title:
Extensions of principal components analysis

dc.contributor.advisor Vempala, Santosh S.
dc.contributor.author Brubaker, S. Charles en_US
dc.contributor.committeeMember Kalai, Adam
dc.contributor.committeeMember Park, Haesun
dc.contributor.committeeMember Kannan, Ravi
dc.contributor.committeeMember Vladimir Koltchinskii
dc.contributor.department Computing en_US
dc.date.accessioned 2009-08-26T17:34:57Z
dc.date.available 2009-08-26T17:34:57Z
dc.date.issued 2009-06-29 en_US
dc.description.abstract Principal Components Analysis is a standard tool in data analysis, widely used in data-rich fields such as computer vision, data mining, bioinformatics, and econometrics. For a set of vectors in n dimensions and a natural number k less than n, the method returns a subspace of dimension k whose average squared distance to that set is as small as possible. Besides saving computation by reducing the dimension, projecting to this subspace can often reveal structure that was hidden in high dimension. This thesis considers several novel extensions of PCA, which provably reveals hidden structure where standard PCA fails to do so. First, we consider Robust PCA, which prevents a few points, possibly corrupted by an adversary, from having a large effect on the analysis. When applied to learning noisy logconcave mixture models, the algorithm requires only slightly more separation between component means than is required for the noiseless case. Second, we consider Isotropic PCA, which can go beyond the first two moments in identifying ``interesting' directions in data. The method leads to the first affine-invariant algorithm that can provably learn mixtures of Gaussians in high dimensions, improving significantly on known results. Thirdly, we define the ``Subgraph Parity Tensor' of order r of a graph and reduce the problem of finding planted cliques in random graphs to the problem of finding the top principal component of this tensor. en_US
dc.description.degree Ph.D. en_US
dc.identifier.uri http://hdl.handle.net/1853/29645
dc.publisher Georgia Institute of Technology en_US
dc.subject Principal components analysis en_US
dc.subject Planted cliques en_US
dc.subject Random tensors en_US
dc.subject Mixture models en_US
dc.subject.lcsh Principal components analysis
dc.subject.lcsh Algorithms
dc.subject.lcsh Mathematical statistics
dc.subject.lcsh Eigenvectors
dc.title Extensions of principal components analysis en_US
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Vempala, Santosh S.
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computer Science
relation.isAdvisorOfPublication 08846825-37f1-410b-b338-526d4f79815b
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 6b42174a-e0e1-40e3-a581-47bed0470a1e
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
brubaker_spencer_c_200908_phd.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format
Description: