Person:

Park, Haesun

Permanent Link

https://hdl.handle.net/1853/71577

Associated Organization(s)

Organizational Unit

School of Computational Science and Engineering

Full item page

Publication Search Results

Now showing 1 - 10 of 12

Workshop on Future Direction in Numerical Algorithms and Optimization

(Georgia Institute of Technology, 2008-01-15) Park, Haesun ; Golub, Gene ; Wu, Weili ; Du, Ding-Zhu
Sparse Nonnegative Matrix Factorization for Clustering

(Georgia Institute of Technology, 2008) Kim, Jingu ; Park, Haesun

Properties of Nonnegative Matrix Factorization (NMF) as a clustering method are studied by relating its formulation to other methods such as K-means clustering. We show how interpreting the objective function of K-means as that of a lower rank approximation with special constraints allows comparisons between the constraints of NMF and K-means and provides the insight that some constraints can be relaxed from K-means to achieve NMF formulation. By introducing sparsity constraints on the coefficient matrix factor in NMF objective function, we in term can view NMF as a clustering method. We tested sparse NMF as a clustering method, and our experimental results with synthetic and text data shows that sparse NMF does not simply provide an alternative to K-means, but rather gives much better and consistent solutions to the clustering problem. In addition, the consistency of solutions further explains how NMF can be used to determine the unknown number of clusters from data. We also tested with a recently proposed clustering algorithm, Affinity Propagation, and achieved comparable results. A fast alternating nonnegative least squares algorithm was used to obtain NMF and sparse NMF.
Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons

(Georgia Institute of Technology, 2008) Kim, Jingu ; Park, Haesun

Nonnegative Matrix Factorization (NMF) is a dimension reduction method that has been widely used for various tasks including text mining, pattern analysis, clustering, and cancer class discovery. The mathematical formulation for NMF appears as a non-convex optimization problem, and various types of algorithms have been devised to solve the problem. The alternating nonnegative least squares (ANLS) framework is a block coordinate descent approach for solving NMF, which was recently shown to be theoretically sound and empirically efficient. In this paper, we present a novel algorithm for NMF based on the ANLS framework. Our new algorithm builds upon the block principal pivoting method for the nonnegativity constrained least squares problem that overcomes some limitations of active set methods. We introduce ideas to efficiently extend the block principal pivoting method within the context of NMF computation. Our algorithm inherits the convergence theory of the ANLS framework and can easily be extended to other constrained NMF formulations. Comparisons of algorithms using datasets that are from real life applications as well as those artificially generated show that the proposed new algorithm outperforms existing ones in computational speed.
ALGORITHMS: Collaborative research: development of vector space based methods for protein structure prediction

(Georgia Institute of Technology, 2007-07-16) Park, Haesun ; Vazirani, Vijay V.
Fast Linear Discriminant Analysis using QR Decomposition and Regularization

(Georgia Institute of Technology, 2007-03-23) Park, Haesun ; Drake, Barry L. ; Lee, Sangmin ; Park, Cheong Hee

Linear Discriminant Analysis (LDA) is among the most optimal dimension reduction methods for classification, which provides a high degree of class separability for numerous applications from science and engineering. However, problems arise with this classical method when one or both of the scatter matrices is singular. Singular scatter matrices are not unusual in many applications, especially for high-dimensional data. For high-dimensional undersampled and oversampled problems, the classical LDA requires modification in order to solve a wider range of problems. In recent work the generalized singular value decomposition (GSVD) has been shown to mitigate the issue of singular scatter matrices, and a new algorithm, LDA/GSVD, has been shown to be very robust for many applications in machine learning. However, the GSVD inherently has a considerable computational overhead. In this paper, we propose fast algorithms based on the QR decomposition and regularization that solve the LDA/GSVD computational bottleneck. In addition, we present fast algorithms for classical LDA and regularized LDA utilizing the framework based on LDA/GSVD and preprocessing by the Cholesky decomposition. Experimental results are presented that demonstrate substantial speedup in all of classical LDA, regularized LDA, and LDA/GSVD algorithms without any sacrifice in classification performance for a wide range of machine learning applications.
Non-Negative Matrix Factorization Based on Alternating Non-Negativity Constrained Least Squares and Active Set Method

(Georgia Institute of Technology, 2007) Kim, Hyunsoo ; Park, Haesun
A Comparison of Generalized Linear Discriminant Analysis Algorithms

(Georgia Institute of Technology, 2006-01-28) Park, Cheong Hee ; Park, Haesun

Linear Discriminant Analysis (LDA) is a dimension reduction method which finds an optimal linear transformation that maximizes the class separability. However, in undersampled problems where the number of data samples is smaller than the dimension of data space, it is difficult to apply the LDA due to the singularity of scatter matrices caused by high dimensionality. In order to make the LDA applicable, several generalizations of the LDA have been proposed recently. In this paper, we present theoretical and algorithmic relationships among several generalized LDA algorithms and compare their computational complexities and performances in text classification and face recognition. Towards a practical dimension reduction method for high dimensional data, an efficient algorithm is proposed, which reduces the computational complexity greatly while achieving competitive prediction accuracies. We also present nonlinear extensions of these LDA algorithms based on kernel methods. It is shown that a generalized eigenvalue problem can be formulated in the kernel-based feature space, and generalized LDA algorithms are applied to solve the generalized eigenvalue problem, resulting in nonlinear discriminant analysis. Performances of these linear and nonlinear discriminant analysis algorithms are compared extensively.
Multiclass Classifiers Based on Dimension Reduction with Generalized LDA

(Georgia Institute of Technology, 2006-01-27) Kim, Hyunsoo ; Drake, Barry L. ; Park, Haesun

Linear discriminant analysis (LDA) has been widely used for dimension reduction of data sets with multiple classes. The LDA has been recently extended to various generalized LDA methods which are applicable regardless of the relative sizes between the data dimension and the number of data items. In this paper, we propose several multiclass classifiers based on generalized LDA algorithms, taking advantage of the dimension reducing transformation matrix without requiring additional training or any parameter optimization. A marginal linear discriminant classifier, a Bayesian linear discriminant classifier, and a one-dimensional Bayesian linear discriminant classifier are introduced for multiclass classification. Our experimental results illustrate that these classifiers produce higher ten-fold cross validation accuracy than kNN and centroid based classification in the reduced dimensional space providing efficient general multiclass classifiers.
Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares

(Georgia Institute of Technology, 2006) Kim, Hyunsoo ; Park, Haesun

Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Non-negative matrix factorization (NMF) is a useful technique in approximating these high dimensional data. Sparse NMFs are also useful when we need to control the degree of sparseness in non-negative basis vectors or non-negative lower-dimensional representations. In this paper, we introduce novel sparse NMFs via alternating non-negativity-constrained least squares. We applied one of the proposed sparse NMFs to cancer class discovery and gene expression data analysis. Our experimental results illustrate that our proposed method achieves better clustering performance than NMF based on multiplicative update rules and sparse NMFs based on the gradient descent method.
Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis

(Georgia Institute of Technology, 2006) Ye, Jieping ; Janardan, Ravi ; Li, Qi ; Park, Haesun

High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated Linear Discriminant Analysis (ULDA) was recently proposed for feature reduction. The extracted features via ULDA were shown to be statistically uncorrelated, which is desirable for many applications. In this paper, an algorithm called ULDA/QR is proposed to simplify the previous implementation of ULDA. Then the ULDA/GSVD algorithm is proposed based on a novel optimization criterion, to address the singularity problem which occurs in undersampled problems, where the data dimension is larger than the data size. The criterion used is the regularized version of the one in ULDA/QR. Surprisingly, our theoretical result shows that the solution to ULDA/GSVD is independent of the value of the regularization parameter. Experimental results on various types of datasets are reported to show the effectiveness of the proposed algorithm and to compare it with other commonly used feature reduction algorithms.