A NOVEL CHINESE MULTI-DOCUMENT SUMMARIZATION USING CLUSTERING BASED SENTENCE EXTRACTION

摘要：

This paper proposes a strategy for Chinese multi-document summarization based on clustering and sentence extraction. It adopts the term vector to represent the linguistic unit in Chinese document, which obtains higher representation quality than traditional word-based vector space model in a certain extent. As for clustering, we propose two heuristics to automatically detect the proper number of clusters: the first one makes full use of the summary length fixed by the user; the second is a stability method, which has been applied to other unsupervised learning problems. We also discuss a global searching method for sentence selection from the clusters. To evaluate our summarization strategy, an extrinsic evaluation method based on classification task is adopted. Experimental results on news document set show that the new strategy can significantly enhance the performance of Chinese multi-document summarization.

关键词： Chinese Multi-document summarization term vector space stability method global searching method extrinsic evaluation

作者: DE-XI LIU YAN-XIANG HE DONG-HONG JI HUA YANG

作者单位: School of Physics, Xiangfan University, Xiangfan 441053 P.R.China;School of Computer, Wuhan Universi School of Computer, Wuhan University, Wuhan 430079 P.R.China;Center for Study of Language and Inform Center for Study of Language and Information, Wuhan University, Wuhan 430079 P.R.China;Institute for

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 2592-2597

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A NOVEL CHINESE MULTI-DOCUMENT SUMMARIZATION USING CLUSTERING BASED SENTENCE EXTRACTION