会议专题

Toward a Robust Data Fusion for Document Retrieval

This paper describes an investigation of signal boosting techniques for post-search data fusion, where the quality of the retrieval results involved in fusion may be low or diverse. The effectiveness of data fusion techniques in such situation depends on the ability of the fusion techniques to be able to boost the signals from relevant documents and reduce the effect of noise that often comes from low quality retrieval results. Our studies on Malach spoken document collection and HARD collection have demonstrated that CombMNZ, the most widely used data fusion method, does not have such ability. We, therefore, developed two versions of signal boosting mechanisms on top of CombMNZ, which result in two new fusion methods called WCombMNZ and WCombMWW. To examine the effectiveness of the two new methods, we conducted experiments on Malach and HARD document collections. Our results show that the new methods can significantly outperform CombMNZ in combining retrieval results that are low and diverse. When the tasks are to combine retrieval results that are in similar quality, which have been the scenarios that CombMNZ are applied often, the two new methods still can obtain often better, sometimes significantly, fusion results.

Data fusion Spoken document retrieval CombMNZ Malach TREC HARD

Daqing He Dan Wu

School of Information Sciences University of Pittsburgh Pittsburgh,PA,USA School of Information Management Wuhan Univeristy Wuhan,HuiBei,China

国际会议

The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering(IEEE NLP-KE 2008)(2008IEEE自然语言处理与知识工程国际会议)

北京

英文

2008-10-19(万方平台首次上网日期,不代表论文的发表时间)