HAWK: A Focused Crawler with Content and Link Analysis
Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size of the web. Focused crawlers aim to search only the subset of the web related to a specific topic, and offer a potential solution to the problem. But it also has problems. The major problem is how to retrieve the maximal set of relevant and quality pages. To address this problem we design a focused crawler (we call it HAWK) that not only uses content of web page to improve page relevance, but also uses link structure to improve the coverage of a specific topic.
search engine focused crawler content link structure
Xiaoyun Chen Xin Zhang
School of Information Science & Engineering, Lanzhou University, PRC 730000
国际会议
AiR08,EM2108,SOAIC08,SIOKM08,BIMA08,DKEEE08(2008IEEE国际电子商务工程学术会议)
西安
英文
677-680
2008-10-22(万方平台首次上网日期,不代表论文的发表时间)