会议专题

Statistical Properties of Community Structure in Large Social and Information Networks

A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the bestpossible community-according to the conductance measure -over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a signifi-cantly more re.ned picture of community structure in large real-world networks than has been appreciated previously. Our most striking.nding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually blend inith the rest of the network and thus become less “community-likeThis behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative forest fireBurning process, is able to produce graphs exhibiting a network community structure similar to our observations.

Social networks Graph partitioning Community structure Conductance Random walks.

Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney

Carnegie Mellon University Yahoo! Research

国际会议

第十七届国际万维网大会(the 17th International World Wide Web Conference)(WWW08)

北京

英文

2008-04-21(万方平台首次上网日期,不代表论文的发表时间)