会议专题

Dynamic Splog Filtering algorithm Based on Combinational Features

This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The existing algorithms of identifying splogs based on lexical frequency features which are quite redundancy and lack correlation, degrades blog search results as well as wastes network resources. In our approach we exploit a dynamic filtering algorithm based on the combinational features of splog(CFDS) to detect splogs. CFDS algorithm selects several efficient novel features such as self- similarity features and the attributes of author to take place of the larger redundant lexical frequency features. Moreover, we extract a content based feature vector from different parts of the biog. The dimensionality of the feature vector is reduced by ECE (Expected Cross Entropy) evaluation criterion. We have tested an SVM based splog detector using combinational features on the standard datasets, with excellent filtering efficiency.

splogs splog detection combinational features self-similarity SVM

Yong-gong Ren Ming-fei Yin Jian Wang

School of Computer and Information Technology Liaoning Normal University Dalian, China Department of Modem Educational Technology Dalian Medical University Dalian, China

国际会议

第8届全国web信息系统及应用学术会议

重庆

英文

82-85

2011-10-21(万方平台首次上网日期,不代表论文的发表时间)