会议专题

Improve Web Search Diversification with Intent Subtopic Mining

  A number of search user behavior studies show that queries with un clear intents are commonly submitted to search engines.Result diversification is usually adopted to deal with those queries, in which search engine tries to trade-off some relevancy for some diversity to improve user experience.In this work, we aim to improve the performance of search results diversification by generating an intent subtopics list with fusion of multiple resources.We based our approach by thinking that to collect a large panel of intent subtopics, we should consider as well a wide range of resources from which to extract.The resources adopted cover a large panel of sources, such as external resources (Wikipedia, Google Keywords Generator, Google Insights, Search Engines query suggestion and completion), anchor texts, page snippets and more.We selected resources to cover both information seeker (What a user is searching for) and information provider (The websites) aspects.We also proposed an effi cient Bayesian optimization approach to maximize resources selection perfor mances, and a new technique to cluster subtopics based on the top results snip pet information and Jaccard Similarity coefficient.Experiments based on TREC 2012 web track and NTCIR-10 intent task show that our framework can greatly improve diversity while keeping a good precision.The system developed with the proposed techniques also achieved the best English subtopic mining performance in NTCIR-10 intent task.

Aymeric Damien Min Zhang Yiqun Liu Shaoping Ma

State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

国际会议

Second CCF Conference,NLPCC2013(第二届自然语言处理与中文计算会议)

重庆

英文

322-333

2013-11-15(万方平台首次上网日期,不代表论文的发表时间)