Entropy-based Clustering for Improving Document Re-ranking

Document re-ranking locates between initial retrieval and query expansion in information retrieval system. In this paper, we propose entropy-based clustering approach for document re-ranking. The value of within-cluster entropy determines whether two classes should be merged, and the value of between-cluster entropy determines how many clusters are reasonable. What to do next is finding a suitable cluster from clustering result to construct pseudo labeled document, and conduct document re-ranking as our previous method. We focus clustering strategy for documents after initial retrieval. Experiment with NTCIR-5 data show that the approach can improve the performance of initial retrieval, and it is helpful for improving the quality of document re-ranking.
component Information Retrieval Document re-ranking Clustering within-cluster entropy between-cluster entropy
Chong Teng Yanxiang He Donghong Ji Cheng zhou Yixuan Geng Shu Chen
Computer School Wuhan University Wuhan,China School of Mathematics and Statistics Wuhan University Wuhan.China International School of Software Wuhan University Wuhan,China
国际会议
上海
英文
2477-2481
2009-11-20(万方平台首次上网日期,不代表论文的发表时间)