Combating Link Spam by Noisy Link Analysis
Link Spam has indentified as one of the major obstacles for link-based ranking algorithms of modern search engine since it intently constructs hyperlink structure to help some poor-content pages obtaining undeserved high rank. This problem is even worse with the advent of wikis, blogs and forum that are rich in links. Existing works on link spam are mainly focused on link spam detection by extracting some special link structures (e.g. clique, tight bipartite etc.). However, link spam structures could have many variations and easily make the existing detection methods ineffective. In this paper, we tackle the problem of link spam from a more fundamental viewpoint-noisy link analysis. First of all, how non-voting hyperlinks affect the quality of ranking is investigated, and then based on this investigation, an approach to detect and process noisy link both effectively and automatically is proposed. We also compare our work with two other related works (TrustRank and Site-level Noise removal) on two real web datasets. The experimental results demonstrate that the proposed noisy link analysis is very effective on both spam page filtering and final ranking improvement.
Link Spam Noisy link Down-weight PageRank
Yitong Wang Xiaofei Chen Xiaojun Feng
School of Computer Science Fudan University Shanghai China
国际会议
6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)
重庆
英文
453-464
2010-11-19(万方平台首次上网日期,不代表论文的发表时间)