Extracting Spam Blogs with Co-citation Clusters
This paper reports the estimated number of spam blogs in order to assess their current state in the blogosphere. To extract spam blogs, I developed a traversal method among co-citation clusters of blogs from a spam seed. Spam seeds were collected in terms of high out-degree and spam keyword. According to the experiment, a mixed seed set composed of high out-degree and spam keyword seeds is more effective than individual seed sets in terms of FMeasure. In conclusion, mixed seeds from different methods are effective in improving the F-Measure results of spam extraction with co-citation clusters.
Spam Blog Extraction Co-citation Cluster Advertisement Link
Kazunari Ishida
The University of Shimane 2433-2 Nobara-cho, Hamada-shi, Shimane 697-0016, JAPAN
国际会议
第十七届国际万维网大会(the 17th International World Wide Web Conference)(WWW08)
北京
英文
2008-04-21(万方平台首次上网日期,不代表论文的发表时间)