An Effective Relevance Prediction Algorithm Based on Hierarchical Taxonomy for Focused Crawling
How to give a formal description for a users interested topic and predict the relevance of unvisited pages to the given topic effectively is a key issue in the design of focused crawlers.However,almost all previous known focused crawlers do the Relevance Predication based on the Flat Information (RPFI) of topic only,i.e.regardless of the context between keywords or topics.In this paper,we first introduce an algorithm to map the topic described in a keyword set or a document written in natural language text to those described in hierarchical topic taxonomy.Then,we propose a novel approach to do the Relevance Predication based on the Hierarchical Context Information (RPHCI) of the taxonomy.Experiments show that the focused crawler based on RPHCI can obtain significantly higher efficiency than those based on RPFI.
Focused Crawling Relevance Prediction Hierarchical Topic Taxonomy Topic Description
Zhumin Chen Jun Ma Xiaohui Han Dongmei Zhang
School of Computer Science & Technology,Shandong University,Jinan,250061,China
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
613-619
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)