会议专题

Mixture of Topic Model for Multi-document Summarization

  Based on LDA(Latent Dirichlet Allocation) topic model, a generative model for multi-document summarization, namely Titled-LDA that simultaneously models the content of documents and the titles of document is proposed. This generative model represents each document with a mixture of topics, and extends these approaches to title modeling by allowing the mixture weights for topics to be determined by the titles of the document. In the mixing stage, the algorithm can learn the weight in an adaptive asymmetric learning way based on two kinds of information entropies. In this way, the final model incorporated the title information and the content information appropriately, which helped the performance of summarization. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2002 corpus.

multi-document summarization LDA topic model

Liu Na Li Ming-xia Lu Ying Tang Xiao-jun Wang Hai-wen Xiao Peng

School of Information Science & Engineering, Dalian Polytechnic University, Dilian, 116034

国际会议

第26届中国控制与决策会议(2014 CCDC)

长沙

英文

5168-5172

2014-05-31(万方平台首次上网日期,不代表论文的发表时间)