Semantic Keywords-Based Duplicated Web Pages Removing
Because of many duplicated web pages existing on the web, search engines need to find and remove them, not only for saving process time and hardware resource, but also for ensuring that users can get the result information without many replicas. In this paper, we propose a method to find and remove duplicated Chinese web pages for search engine. First we describe a scheme based on semantic keywords combined with sentence overlapping, and then show an implemented prototype, with the experimental results that suggest the prototype work well under a proper setting.
Duplicated web pages semantic keywords IR
Yunhe Weng Lei Li Yixin Zhong
School of Information Engineering,Beijing University of Posts and Tele-communications Beijing,China
国际会议
北京
英文
2008-10-19(万方平台首次上网日期,不代表论文的发表时间)