Efficient Focused Crawling Strategy Using Combination of Link Structure and Content Similarity
At present, focused crawler usually crawl pages using the link structure or page contents. But both of them have some flaws. So we designed an efficient crawling strategy, which combine the link structure with content similarity.We extracted topic feature vector automatically and judge the topic similarity of a page using combination of link structure and page content. We also forecast the URL similarity using link structure in topic pages. Experiments showed that this strategy effectively increase the precision of fetching topic pages.
Qu Cheng Wang Beizhan Wei Pianpian
Software School,Xiamen University,Xiamen 361005,Fujian,China
国际会议
厦门
英文
1045-1048
2008-12-12(万方平台首次上网日期,不代表论文的发表时间)