Title:
MMAP: Mining Billion-Scale Graphs on a PC with Fast, Minimalist Approach via Memory Mapping
MMAP: Mining Billion-Scale Graphs on a PC with Fast, Minimalist Approach via Memory Mapping
Authors
Sabrin, Kaeser Md.
Lin, Zhiyuan
Chau, Duen Horng
Lee, Ho
Kang, U.
Lin, Zhiyuan
Chau, Duen Horng
Lee, Ho
Kang, U.
Authors
Person
Advisors
Advisors
Associated Organizations
Organizational Unit
Organizational Unit
Collections
Supplementary to
Permanent Link
Abstract
Large graphs with billions of nodes and edges are increasingly common, calling for new kinds of scalable computation frameworks.
State-of-the-art approaches such as GraphChi and TurboGraph recently demonstrated that a single PC can efficiently perform advanced computation on billion-node graphs. Although fast, they use
sophisticated data structures, explicit memory management, and optimization techniques to achieve high speed and scalability. We propose a
minimalist
approach that forgoes such complexities,
by leveraging the fundamental
memory mapping
(MMap) capability
found on operating systems. We present multiple, major findings; we
contribute: (1) our crucial insight that MMap can be a viable technique for creating fast, scalable graph algorithms that surpass some
of the best techniques; (2) a
counterintuitive
result that
we can do less
and gain more
; MMap enables us to use a much simpler data structure
(edge list) and algorithm design, and to defer memory management
to the OS, while offering significantly faster or comparable performance as highly-optimized methods (e.g., 10
X as fast as GraphChi
PageRank on
1.47
billion edge Twitter graph); (3) we performed extensive experiments on real and synthetic graphs, including the
6.6 billion edge YahooWeb graph, and show that MMap’s benefits sustain in most conditions. We hope this work will inspire others to explore how memory mapping may help improve other methods or
algorithms to further increase their speed and scalability.
Sponsor
Date Issued
2013
Extent
Resource Type
Text
Resource Subtype
Technical Report