Near-replicas of Web Pages Eliminating Repetitive Algorithms Based onMD5

摘要：

　　The development of the internet and exponential growth of network information produce a large number of duplicated pages on the network,reducing the retrieval of recall and precision and affecting the retrieval efficiency.The accuracy of the web,therefore,influences the quality of search engine.On the basis of the structural text description,this paper proposes an improved eliminating repetitive algorithm method,which is based on MD5 of Near-replicas.It proves that the method has a good effect on improving the recall and the precision through experiment.

关键词： structured web MD5 eliminating repetitive of Web pages eliminating repetitive algorithm

作者: Junya Yan Xiaohui Ma Wenjuan Zhao

作者单位: Business College of Shanxi University, Taiyuan Shanxi, China

会议类型: 国际会议

会议名称: 2012 2nd international Conference on Materials Science and Information Technology(2012第二届材料科学与信息技术国际会议)(MSIT2012)

会议地点: 西安

会议语种:英文

页码: 1752-1756

在线出版日期: 2012-08-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Near-replicas of Web Pages Eliminating Repetitive Algorithms Based onMD5