Title:
Nonnegative matrix factorization for text, graph, and hybrid data analytics

Thumbnail Image
Authors
Du, Rundong
Authors
Advisors
Park, Haesun
Advisors
Associated Organizations
Organizational Unit
Organizational Unit
Series
Supplementary to
Abstract
Constrained low rank approximation is a general framework for data analysis, which usually has the advantage of being simple, fast, scalable and domain general. One of the most known constrained low rank approximation methods is nonnegative matrix factorization (NMF). This research studies the design and implementation of several variants of NMF for text, graph and hybrid data analytics. It will address challenges including solving new data analytics problems and improving the scalability of existing NMF algorithms. There are two major types of matrix representation of data: feature-data matrix and similarity matrix. Previous work showed successful application of standard NMF for feature-data matrix to areas such as text mining and image analysis, and Symmetric NMF (SymNMF) for similarity matrix to areas such as graph clustering and community detection. In this work, a divide-and-conquer strategy is applied to both methods to improve their time complexity from cubic growth with respect to the reduced low rank to linear growth, resulting in DC-NMF and HierSymNMF2 methods. Extensive experiments on large scale real world data show improved performance of these two methods. Furthermore, in this work NMF and SymNMF are combined into one formulation called JointNMF, to analyze hybrid data that contains both text content and connection structure information. Typical hybrid data where JointNMF can be applied includes paper/patent data where there are citation connections among content and email data where the sender/receipts relation is represented by a hypergraph and the email content is associated with hypergraph edges. An additional capability of the JointNMF is prediction of unknown network information which is illustrated using several real world problems such as citation recommendations of papers and activity/leader detection in organizations. This dissertation also includes brief discussions of relationship among different variants of NMF.
Sponsor
Date Issued
2018-04-10
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI