会议专题

An Algorithm of Deep Web Crawler’s Crawling

As an ever-increasing amount of information on the web today is available through search interfaces,users have to key in a set of keywords in order to access the pages from certain web sites,Which are oftell referred to as the hidden web or the deep web.Since there is no static links to the hidden web pages,search engines cannot discover and index such pages.However,according to recent studies,the content provided by many hidden web sites is often of very high quality and can be extremely valuable to many users.How to build an effective hidden web crawler that can autonomously discover and download pages from the hidden web is studied.Since the only “entry pointto a hidden web site is a query interface,the main challenge to a hidden web crawler is how to automatically generate meaningful queries for issue to the site.A theoretical framework to investigate the query generation problem for the hidden web and we propose effective policies for generating queries automatically is provided.Experiment shows that these policies are effective.

deep web deep web crawler query selection adaptive algorithm

XIANG Peisu TIAN Ke HUANG Qinzhen

国际会议

The International Conference Information Computing and Automation(2007国际信息计算与自动化会议)

成都

英文

1259-1262

2007-12-19(万方平台首次上网日期,不代表论文的发表时间)