会议专题

Subtopic-based Multi-documents Summarization

Multi-documents summarization is an important research area of NLP. Most methods or techniques of multi-document summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopicbased Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it With the sentences represented as subtopicvectors, it assesses the semantic distances of sentences from the documents collections main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topics documents collection with some other topics documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.

multi-documents summarization topic model subtopic sentence representation

Shu Gong Youli Qu Shengfeng Tian

School of Computer and Information Technology,Beijing Jiaotong University,Beijing,100044,China

国际会议

The Third International Joint Conference on Computational Science and Optimization(第三届计算科学与优化国际大会 CSO 2010)

黄山

英文

382-386

2010-05-28(万方平台首次上网日期,不代表论文的发表时间)