会议专题

Modeling Anchor Text and Classifying Queries to Enhance Web Document Retrieval

Several types of queries are widely used on the World Wide Web and the expected retrieval method can vary depending on the query type. We propose a method for classifying queries into informational and navigational types. Because terms in navigational queries often appear in anchor text for links to other pages, we analyze the distribution of query terms in anchor texts on the Web for query classification purposes. While content-based retrieval is effective for informational queries, anchor-based retrieval is effective for navigational queries. Our retrieval system combines the results obtained with the content-based and anchor-based retrieval methods, in which the weight for each retrieval result is determined automatically depending on the result of the query classi.cation. We also propose a method for improving anchor-based retrieval. Our retrieval method, which computes the probability that a document is retrieved in response to the given query, identifles synonyms of query terms in the anchor texts on the Web and uses these synonyms for smoothing purposes in the probability estimation. We use the NTCIR test collections and show the effectiveness of individual methods and the entire Web retrieval system experimentally.

Web retrieval anchor text query classification

Atsushi Fujii

Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550, Japan

国际会议

第十七届国际万维网大会(the 17th International World Wide Web Conference)(WWW08)

北京

英文

2008-04-21(万方平台首次上网日期,不代表论文的发表时间)