(Georgia Institute of Technology, 2011-02-22)
Riedy, Jason; Bader, David A.; Jiang, Karl; Pande, Pushkar; Sharma, Richa
Analyzing massive social networks challenges both high-performance computers and human under-
standing. These massive networks cannot be visualized easily, and their scale makes applying complex
analysis methods computationally expensive. We present a region-growing method for finding a smaller,
more tractable subgraph, a community, given a few example seed vertices. Unlike existing work, we focus
on a small number of seed vertices, from two to a few dozen. We also present the first comparison between five algorithms for expanding a small seed set into a community. Our comparison applies these algorithms
to an R-MAT generated graph component with 240 thousand vertices and 32 million edges and evaluates
the community size, modularity, Kullback-Leibler divergence, conductance, and clustering coefficient. We find that our new algorithm with a local modularity maximizing heuristic based on Clauset, Newman,
and Moore performs very well when the output is limited to 100 or 1000 vertices. When run without a
vertex size limit, a heuristic from McCloskey and Bader generates communities containing around 60% of
the graph's vertices and having a small conductance and modularity appropriate to the result size. A
personalized PageRank algorithm based on Andersen, Lang, and Chung also performs well with respect
to our metrics.