会议专题

A NOVEL CHINESE MULTI-DOCUMENT SUMMARIZATION USING CLUSTERING BASED SENTENCE EXTRACTION

This paper proposes a strategy for Chinese multi-document summarization based on clustering and sentence extraction. It adopts the term vector to represent the linguistic unit in Chinese document, which obtains higher representation quality than traditional word-based vector space model in a certain extent. As for clustering, we propose two heuristics to automatically detect the proper number of clusters: the first one makes full use of the summary length fixed by the user; the second is a stability method, which has been applied to other unsupervised learning problems. We also discuss a global searching method for sentence selection from the clusters. To evaluate our summarization strategy, an extrinsic evaluation method based on classification task is adopted. Experimental results on news document set show that the new strategy can significantly enhance the performance of Chinese multi-document summarization.

Chinese Multi-document summarization term vector space stability method global searching method extrinsic evaluation

DE-XI LIU YAN-XIANG HE DONG-HONG JI HUA YANG

School of Physics, Xiangfan University, Xiangfan 441053 P.R.China;School of Computer, Wuhan Universi School of Computer, Wuhan University, Wuhan 430079 P.R.China;Center for Study of Language and Inform Center for Study of Language and Information, Wuhan University, Wuhan 430079 P.R.China;Institute for

国际会议

2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

大连

英文

2592-2597

2006-08-13(万方平台首次上网日期,不代表论文的发表时间)