A Sample-Guided Approach to Incremental Structured Web Database Crawling

摘要：

Web database crawling is a promising solution for Deep Web data integration. To the best of our knowledge, the existing approaches only focused on how to crawl all records in a web database. Due to the high dynamic of most web databases, it is not practical to harvest a small proportion of new records by crawling the whole database. This paper studies the problem of incremental web database crawling, which targets at crawling the new records from a web database efficiently. In the proposed approach, a new graph model, query related graph, is proposed to transform a incremental crawling task into a graph traversal process. Based on this graph model, appropriate queries are generated for crawling which are guided by the samples of the web database. Extensive experimental evaluations over real Web databases validate the effectiveness of our techniques and provide insights for future efforts in this direction.

关键词： Web database Deep Web data integration Web database crawling

作者: Wei Liu Jianguo Xiao Jianwu Yang

作者单位: Institute of Computer Science & TechnologyKey Laboratory of Computational Linguistics (Peking Univer Institute of Computer Science & Technology Key Laboratory of Computational Linguistics (Peking Unive

会议类型: 国际会议

会议名称: 2010 IEEE信息与自动化国际会议(ICIA 2010)

会议地点: 哈尔滨

会议语种:英文

页码: 1-6

在线出版日期: 2010-06-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Sample-Guided Approach to Incremental Structured Web Database Crawling