Person:
Park, Haesun

Associated Organization(s)
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 5 of 5
  • Item
    Multiclass Classifiers Based on Dimension Reduction with Generalized LDA
    (Georgia Institute of Technology, 2006-01-27) Kim, Hyunsoo ; Drake, Barry L. ; Park, Haesun
    Linear discriminant analysis (LDA) has been widely used for dimension reduction of data sets with multiple classes. The LDA has been recently extended to various generalized LDA methods which are applicable regardless of the relative sizes between the data dimension and the number of data items. In this paper, we propose several multiclass classifiers based on generalized LDA algorithms, taking advantage of the dimension reducing transformation matrix without requiring additional training or any parameter optimization. A marginal linear discriminant classifier, a Bayesian linear discriminant classifier, and a one-dimensional Bayesian linear discriminant classifier are introduced for multiclass classification. Our experimental results illustrate that these classifiers produce higher ten-fold cross validation accuracy than kNN and centroid based classification in the reduced dimensional space providing efficient general multiclass classifiers.
  • Item
    Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares
    (Georgia Institute of Technology, 2006) Kim, Hyunsoo ; Park, Haesun
    Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Non-negative matrix factorization (NMF) is a useful technique in approximating these high dimensional data. Sparse NMFs are also useful when we need to control the degree of sparseness in non-negative basis vectors or non-negative lower-dimensional representations. In this paper, we introduce novel sparse NMFs via alternating non-negativity-constrained least squares. We applied one of the proposed sparse NMFs to cancer class discovery and gene expression data analysis. Our experimental results illustrate that our proposed method achieves better clustering performance than NMF based on multiplicative update rules and sparse NMFs based on the gradient descent method.
  • Item
    Relationships Between Support Vector Classifiers and Generalized Linear Discriminant Analysis on Support Vectors
    (Georgia Institute of Technology, 2006) Kim, Hyunsoo ; Drake, Barry L. ; Park, Haesun
    The linear discriminant analysis based on the generalized singular value decomposition (LDA/GSVD) has been introduced to circumvent the nonsingularity restriction inherent in the classical LDA. The LDA/GSVD provides a framework in which a dimension reducing transformation can be effectively obtained for undersampled problems. In this paper, relationships between support vector machines (SVMs) and the generalized linear discriminant analysis applied to the support vectors are studied. Based on the GSVD, the weight vector of the hard-margin SVM is proved to be equivalent to the dimension reducing transformation vector generated by LDA/GSVD applied to the support vectors of the binary class. We also show that the dimension reducing transformation vector and the weight vector of soft-margin SVMs are related when a subset of support vectors are considered. These results can be generalized when kernelized SVMs and the kernelized LDA/GSVD called KDA/GSVD are considered. Through these relationships, it is shown that support vector classification is related to data reduction as well as dimension reduction by LDA/GSVD.
  • Item
    Extracting Unrecognized Gene Relationships From the Biomedical Literature via Matrix Factorizations Using a Priori Knowledge of Gene Relationships
    (Georgia Institute of Technology, 2006) Kim, Hyunsoo ; Park, Haesun
    The construction of literature-based networks of gene-gene interactions is one of the most important applications of text mining in bioinformatics. Extracting potential gene relationships from the biomedical literature may be helpful in building biological hypotheses that can be explored further experimentally. In this paper, we explore the utility of singular value decomposition (SVD) and nonnegative matrix factorization (NMF) to extract unrecognized gene relationships from the biomedical literature by taking advantage of known gene relationships. We introduce a way to incorporate a priori knowledge of gene relationships into LSI/SVD and NMF. In addition, we propose a gene retrieval method based on NMF (GR/NMF), which shows comparable performance with latent semantic indexing based on SVD.