会议专题

Web Object Block Mining Based on Tag Similarity

Currently, a large number of Web information on the lnternet is presented in structured objects. Mining object information from Web is of great importance for Web data management. This paper presents a Web object block mining method based on tag similarity. It first constructs a DOM tree for the Web page and calculates the similarity of all possible generalized nodes. Then a pruning method is used to filter the redundant information based on the features of noise data and find the Web object region. Finally the Web objects are identified in the Web object region. The experiment results show that, comparing to IEPAD, our method got a higher precision.

Web Object Region Information Extraction Tag Similarity DOM tree Generalized Node

Rui Liu Rui Xiong Kun Gao

State Key Lab of Software Development Environment, Beihang University,No.37 Xueyuan Road, Haidian District, Beijing, 100191, P.R.China

国际会议

2010 International Conference on Intelligent Computation Technology and Automation(2010 智能计算技术与自动化国际会议 ICICTA 2010)

长沙

英文

3521-3524

2010-05-11(万方平台首次上网日期,不代表论文的发表时间)