A Study on Cross-Language Tezt Summarization Using Supervised Methods
In this work, we use Hidden Markov Models (HMM), Conditional Random Field (CRF), Gaussian Mixture Models (GMM) and Mathematical Methods of Statistics (MMS) for Chinese and Japanese text summarization. The purpose of this work is to study the applicability of mentioned three trainable models for cross-language text summarization. For model training, we use several training features such as sentence position, sentence centrality, number of Name Entity and so on. For model testing, Chinese on-line news and Japanese news are used as test data which are extracted from web pages. We evaluate each model by measuring the precision at the compression rate 10%, 20% and 30%. MMS is a baseline method. The results show that HMM, CRF and GMM have remarkable increases than MMS on both Chinese and Japanese text summarization by using the same training features. Especially, GMM model make a best performance in all tests.
Tezt Summarization NLP Machine Learning
Lei Yu Fuji Ren
Graduate School of Advanced Science Technology Education, The University of Tokushima 2-1 Minamijosa Graduate School of Advanced Science Technology Education, The University of Tokushima 2-1 Minamijosa
国际会议
大连
英文
1-7
2009-09-24(万方平台首次上网日期,不代表论文的发表时间)