A Parallel Computing Model for Large-Graph Mining with MapReduce

How can we quickly find the structures and characters of a large-scale graph? Large-scale graph exists everywhere, such as CALL graph, the World Wide Web, Facebook networks and many more. The continued exponential growth in both the size and complexity of the graphs is giving birth to a new challenge to the analysts and researchers. With respect to these challenges, a new class of algorithms and computing models is needed urgently for the large-scale graphs. An excellent promising clue for dealing with graphs with great sizes is the emerging MapReduce framework and its open-source implementation, Hadoop. The problem of 3-clique enumeration of a graph is an important operation that can help structure mining and a difficult mission for graphs with great sizes on the single computer. In this paper, we propose a parallel computing model for 3-clique enumeration based on cluster system with the help of MapReduce for large-scale graphs. The process of enumeration is firstly to extract one-leap information of the graph, then the two-leap information and finally, the keybased 3-clique enumeration. Also, we apply the computing model to the computation of clustering coefficient. More than anything else, the computing model is applied to three real-world large CALL graphs and the results of the experiments manifest the good scalability and efficiency of the model.
graph mining social network analysis MapReduce clustering coefficient 3-clique
Bin Wu Yuxiao Dong Qing Ke Yanan Cai
School of Computer Science Beijing University of Posts and Telecommunication Beijing, China
国际会议
2011 Seventh International Conference on Natural Computation(第七届自然计算国际会议 ICNC 2011)
上海
英文
43-47
2011-07-26(万方平台首次上网日期,不代表论文的发表时间)