Mixture of Topic Model for Multi-document Summarization
Based on LDA(Latent Dirichlet Allocation) topic model, a generative model for multi-document summarization, namely Titled-LDA that simultaneously models the content of documents and the titles of document is proposed. This generative model represents each document with a mixture of topics, and extends these approaches to title modeling by allowing the mixture weights for topics to be determined by the titles of the document. In the mixing stage, the algorithm can learn the weight in an adaptive asymmetric learning way based on two kinds of information entropies. In this way, the final model incorporated the title information and the content information appropriately, which helped the performance of summarization. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2002 corpus.
multi-document summarization LDA topic model
Liu Na Li Ming-xia Lu Ying Tang Xiao-jun Wang Hai-wen Xiao Peng
School of Information Science & Engineering, Dalian Polytechnic University, Dilian, 116034
国际会议
长沙
英文
5168-5172
2014-05-31(万方平台首次上网日期,不代表论文的发表时间)