Organizational Unit:
School of Computational Science and Engineering

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 5 of 5
  • Item
    Matrix algorithms for data clustering and nonlinear dimension reduction
    (Georgia Institute of Technology, 2008-10-03) Zha, Hongyuan ; Zhang, Ming
  • Item
    Workshop on Future Direction in Numerical Algorithms and Optimization
    (Georgia Institute of Technology, 2008-01-15) Park, Haesun ; Golub, Gene ; Wu, Weili ; Du, Ding-Zhu
  • Item
    Sparse Nonnegative Matrix Factorization for Clustering
    (Georgia Institute of Technology, 2008) Kim, Jingu ; Park, Haesun
    Properties of Nonnegative Matrix Factorization (NMF) as a clustering method are studied by relating its formulation to other methods such as K-means clustering. We show how interpreting the objective function of K-means as that of a lower rank approximation with special constraints allows comparisons between the constraints of NMF and K-means and provides the insight that some constraints can be relaxed from K-means to achieve NMF formulation. By introducing sparsity constraints on the coefficient matrix factor in NMF objective function, we in term can view NMF as a clustering method. We tested sparse NMF as a clustering method, and our experimental results with synthetic and text data shows that sparse NMF does not simply provide an alternative to K-means, but rather gives much better and consistent solutions to the clustering problem. In addition, the consistency of solutions further explains how NMF can be used to determine the unknown number of clusters from data. We also tested with a recently proposed clustering algorithm, Affinity Propagation, and achieved comparable results. A fast alternating nonnegative least squares algorithm was used to obtain NMF and sparse NMF.
  • Item
    Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons
    (Georgia Institute of Technology, 2008) Kim, Jingu ; Park, Haesun
    Nonnegative Matrix Factorization (NMF) is a dimension reduction method that has been widely used for various tasks including text mining, pattern analysis, clustering, and cancer class discovery. The mathematical formulation for NMF appears as a non-convex optimization problem, and various types of algorithms have been devised to solve the problem. The alternating nonnegative least squares (ANLS) framework is a block coordinate descent approach for solving NMF, which was recently shown to be theoretically sound and empirically efficient. In this paper, we present a novel algorithm for NMF based on the ANLS framework. Our new algorithm builds upon the block principal pivoting method for the nonnegativity constrained least squares problem that overcomes some limitations of active set methods. We introduce ideas to efficiently extend the block principal pivoting method within the context of NMF computation. Our algorithm inherits the convergence theory of the ANLS framework and can easily be extended to other constrained NMF formulations. Comparisons of algorithms using datasets that are from real life applications as well as those artificially generated show that the proposed new algorithm outperforms existing ones in computational speed.
  • Item
    SNARE: Spatio-temporal Network-level Automatic Reputation Engine
    (Georgia Institute of Technology, 2008) Feamster, Nick ; Gray, Alexander ; Krasser, Sven ; Syed, Nadeem Ahmed
    Current spam filtering techniques classify email based on content and IP reputation blacklists or whitelists. Unfortunately, spammers can alter spam content to evade content based filters, and spammers continually change the IP addresses from which they send spam. Previous work has suggested that filters based on network-level behavior might be more efficient and robust, by making decisions based on how messages are sent, as opposed to what is being sent or who is sending them. This paper presents a technique to identify spammers based on features that exploit the network-level spatio temporal behavior of email senders to differentiate the spamming IPs from legitimate senders. Our behavioral classifier has two benefits: (1) it is early (i.e., it can automatically detect spam without seeing a large amount of email from a sending IP address-sometimes even upon seeing only a single packet); (2) it is evasion-resistant (i.e., it is based on spatial and temporal features that are difficult for a sender to change). We build classifiers based on these features using two different machine learning methods, support vector machine and decision trees, and we study the efficacy of these classifiers using labeled data from a deployed commercial spam-filtering system. Surprisingly, using only features from a single IP packet header (i.e., without looking at packet contents), our classifier can identify spammers with about 93% accuracy and a reasonably low false-positive rate (about 7%). After looking at a single message spammer identification accuracy improves to more than 94% with a false rate of just over 5%. These suggest an effective sender reputation mechanism.