会议专题

Semantic Keywords-Based Duplicated Web Pages Removing

Because of many duplicated web pages existing on the web, search engines need to find and remove them, not only for saving process time and hardware resource, but also for ensuring that users can get the result information without many replicas. In this paper, we propose a method to find and remove duplicated Chinese web pages for search engine. First we describe a scheme based on semantic keywords combined with sentence overlapping, and then show an implemented prototype, with the experimental results that suggest the prototype work well under a proper setting.

Duplicated web pages semantic keywords IR

Yunhe Weng Lei Li Yixin Zhong

School of Information Engineering,Beijing University of Posts and Tele-communications Beijing,China

国际会议

The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering(IEEE NLP-KE 2008)(2008IEEE自然语言处理与知识工程国际会议)

北京

英文

2008-10-19(万方平台首次上网日期,不代表论文的发表时间)