An Algorithm of Deep Web Crawler’s Crawling

摘要：

As an ever-increasing amount of information on the web today is available through search interfaces,users have to key in a set of keywords in order to access the pages from certain web sites,Which are oftell referred to as the hidden web or the deep web.Since there is no static links to the hidden web pages,search engines cannot discover and index such pages.However,according to recent studies,the content provided by many hidden web sites is often of very high quality and can be extremely valuable to many users.How to build an effective hidden web crawler that can autonomously discover and download pages from the hidden web is studied.Since the only “entry pointto a hidden web site is a query interface,the main challenge to a hidden web crawler is how to automatically generate meaningful queries for issue to the site.A theoretical framework to investigate the query generation problem for the hidden web and we propose effective policies for generating queries automatically is provided.Experiment shows that these policies are effective.

关键词： deep web deep web crawler query selection adaptive algorithm

作者: XIANG Peisu TIAN Ke HUANG Qinzhen

会议类型: 国际会议

会议名称: The International Conference Information Computing and Automation(2007国际信息计算与自动化会议)

会议地点: 成都

会议语种:英文

页码: 1259-1262

在线出版日期: 2007-12-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Algorithm of Deep Web Crawler’s Crawling