Near-replicas of Web Pages Eliminating Repetitive Algorithms Based onMD5
The development of the internet and exponential growth of network information produce a large number of duplicated pages on the network,reducing the retrieval of recall and precision and affecting the retrieval efficiency.The accuracy of the web,therefore,influences the quality of search engine.On the basis of the structural text description,this paper proposes an improved eliminating repetitive algorithm method,which is based on MD5 of Near-replicas.It proves that the method has a good effect on improving the recall and the precision through experiment.
structured web MD5 eliminating repetitive of Web pages eliminating repetitive algorithm
Junya Yan Xiaohui Ma Wenjuan Zhao
Business College of Shanxi University, Taiyuan Shanxi, China
国际会议
西安
英文
1752-1756
2012-08-24(万方平台首次上网日期,不代表论文的发表时间)