会议专题

An Improvement Method of Duplicate Webpage Detection

  As Internet is very easy to implement the diffusing and sharing of resources,duplication of pages on the Internet is very large.The search engine as an index tool of Internet resources is facing a serious repeat testing,its crawler will encounter a large number of links of duplicate content.If these links are all added to the download queue,it will cause a serious drop in performance and this would seriously affect the user experience.In this paper,we adopt an improved duplicate detection method------using BloomFilter combining with fuzzy Hamming distance.This will not only meet the detection of duplicate content,but also h will meet the needs of users.

search engine duplicate detection BloomFilter Fuzzy Hamming distance

Chengqi Zhang Wenqian Shang Yafeng Li

Department of Computer Sciences,Communication University of China,China

国际会议

the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT-2012)(2012年电机工程与信息技术国际会议)

沈阳

英文

27-30

2012-09-26(万方平台首次上网日期,不代表论文的发表时间)