会议专题

Using Empirical Risk Minimization to Detect Community Structure in the Blogosphere

When we are dealing with community structure detecting in the blogosphere, we have come to face some obstacles. The data in a blog may be updated frequently by its owner, making the whole blogosphere become very large during a short period of time. It can be very expensive to deal with such huge amount of data using those traditional methods. Meanwhile, few blogs in the blogosphere can be identified as members of a specify community clearly from their own characters, while we have to judge most blogs depending on the relationship with other neighboring blogs using centrality metrics. Recently, a new method that combines active learning and semi-supervised learning gives quite a good performance on improving the speed and accuracy of machine learning on large scale of data. In this paper, we employ this method to solve the community clustering problem with a vast and complex data set. We try to show that this method really does a better job on labeling and clustering large scale of data by comparing the result with the one achieved in the traditional way. Afterward, we may make some improvements and use it to deal with community detecting in the blogosphere.

Jiaxuan Huang Hongsen Huang

College of Computer Science Zhejiang University Hangzhou, P.R.China 310027

国际会议

The 2010 International Conference on Intelligent Systems and Knowledge Engineering(第五届智能系统与知识工程国际会议)

杭州

英文

418-421

2010-11-15(万方平台首次上网日期,不代表论文的发表时间)