Entropy-based Clustering for Improving Document Re-ranking

摘要：

Document re-ranking locates between initial retrieval and query expansion in information retrieval system. In this paper, we propose entropy-based clustering approach for document re-ranking. The value of within-cluster entropy determines whether two classes should be merged, and the value of between-cluster entropy determines how many clusters are reasonable. What to do next is finding a suitable cluster from clustering result to construct pseudo labeled document, and conduct document re-ranking as our previous method. We focus clustering strategy for documents after initial retrieval. Experiment with NTCIR-5 data show that the approach can improve the performance of initial retrieval, and it is helpful for improving the quality of document re-ranking.

关键词： component Information Retrieval Document re-ranking Clustering within-cluster entropy between-cluster entropy

作者: Chong Teng Yanxiang He Donghong Ji Cheng zhou Yixuan Geng Shu Chen

作者单位: Computer School Wuhan University Wuhan,China School of Mathematics and Statistics Wuhan University Wuhan.China International School of Software Wuhan University Wuhan,China

会议类型: 国际会议

会议名称: 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems(2009 IEEE 智能计算与智能系统国际会议)

会议地点: 上海

会议语种:英文

页码: 2477-2481

在线出版日期: 2009-11-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Entropy-based Clustering for Improving Document Re-ranking