会议专题

Efficient Focused Crawling Strategy Using Combination of Link Structure and Content Similarity

At present, focused crawler usually crawl pages using the link structure or page contents. But both of them have some flaws. So we designed an efficient crawling strategy, which combine the link structure with content similarity.We extracted topic feature vector automatically and judge the topic similarity of a page using combination of link structure and page content. We also forecast the URL similarity using link structure in topic pages. Experiments showed that this strategy effectively increase the precision of fetching topic pages.

Qu Cheng Wang Beizhan Wei Pianpian

Software School,Xiamen University,Xiamen 361005,Fujian,China

国际会议

2008 IEEE International Symposium on IT in Medicine and Education(2008信息技术在医学和教育中的应用国际研讨会)(ITME 2008)

厦门

英文

1045-1048

2008-12-12(万方平台首次上网日期,不代表论文的发表时间)