会议专题

A NOVEL WEB PAGE DUPLICATION DETECTION FRAMEWORK

There are a lot of redundant web pages on Internet. Based on tag statistic and text similarity comparison, we present a novel multilayer framework for detecting duplicated web pages in this paper. We propose two similarity text paragraphs detection algorithms and implement our framework. The experimental results show that our approach achieves high performance, which means that duplicated web pages can be efficiently detected simply by tag statistic and text comparison.

Duplication Detection Web Page Framework

Zhongming Han Dagao Duan Hongzhi Liu Jianzhi Sun

School of computer science and information engineering,Beijing Technology and Business University, Beijing, China

国际会议

2009 IEEE International Conference on Network Infrastructure and Digital Content(2009年IEEE网络基础设施与数字内容国际会议 IEEE IC-NIDC2009)

北京

英文

374-378

2009-11-06(万方平台首次上网日期,不代表论文的发表时间)