Using Empirical Risk Minimization to Detect Community Structure in the Blogosphere

摘要：

When we are dealing with community structure detecting in the blogosphere, we have come to face some obstacles. The data in a blog may be updated frequently by its owner, making the whole blogosphere become very large during a short period of time. It can be very expensive to deal with such huge amount of data using those traditional methods. Meanwhile, few blogs in the blogosphere can be identified as members of a specify community clearly from their own characters, while we have to judge most blogs depending on the relationship with other neighboring blogs using centrality metrics. Recently, a new method that combines active learning and semi-supervised learning gives quite a good performance on improving the speed and accuracy of machine learning on large scale of data. In this paper, we employ this method to solve the community clustering problem with a vast and complex data set. We try to show that this method really does a better job on labeling and clustering large scale of data by comparing the result with the one achieved in the traditional way. Afterward, we may make some improvements and use it to deal with community detecting in the blogosphere.

作者: Jiaxuan Huang Hongsen Huang

作者单位: College of Computer Science Zhejiang University Hangzhou, P.R.China 310027

会议类型: 国际会议

会议名称: The 2010 International Conference on Intelligent Systems and Knowledge Engineering(第五届智能系统与知识工程国际会议)

会议地点: 杭州

会议语种:英文

页码: 418-421

在线出版日期: 2010-11-15（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Using Empirical Risk Minimization to Detect Community Structure in the Blogosphere