Modeling Anchor Text and Classifying Queries to Enhance Web Document Retrieval

摘要：

Several types of queries are widely used on the World Wide Web and the expected retrieval method can vary depending on the query type. We propose a method for classifying queries into informational and navigational types. Because terms in navigational queries often appear in anchor text for links to other pages, we analyze the distribution of query terms in anchor texts on the Web for query classification purposes. While content-based retrieval is effective for informational queries, anchor-based retrieval is effective for navigational queries. Our retrieval system combines the results obtained with the content-based and anchor-based retrieval methods, in which the weight for each retrieval result is determined automatically depending on the result of the query classi.cation. We also propose a method for improving anchor-based retrieval. Our retrieval method, which computes the probability that a document is retrieved in response to the given query, identifles synonyms of query terms in the anchor texts on the Web and uses these synonyms for smoothing purposes in the probability estimation. We use the NTCIR test collections and show the effectiveness of individual methods and the entire Web retrieval system experimentally.

关键词： Web retrieval anchor text query classification

作者: Atsushi Fujii

作者单位: Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550, Japan

会议类型: 国际会议

会议名称: 第十七届国际万维网大会(the 17th International World Wide Web Conference)(WWW08)

会议地点: 北京

会议语种:英文

在线出版日期: 2008-04-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Modeling Anchor Text and Classifying Queries to Enhance Web Document Retrieval