会议专题

A Study on Cross-Language Tezt Summarization Using Supervised Methods

In this work, we use Hidden Markov Models (HMM), Conditional Random Field (CRF), Gaussian Mixture Models (GMM) and Mathematical Methods of Statistics (MMS) for Chinese and Japanese text summarization. The purpose of this work is to study the applicability of mentioned three trainable models for cross-language text summarization. For model training, we use several training features such as sentence position, sentence centrality, number of Name Entity and so on. For model testing, Chinese on-line news and Japanese news are used as test data which are extracted from web pages. We evaluate each model by measuring the precision at the compression rate 10%, 20% and 30%. MMS is a baseline method. The results show that HMM, CRF and GMM have remarkable increases than MMS on both Chinese and Japanese text summarization by using the same training features. Especially, GMM model make a best performance in all tests.

Tezt Summarization NLP Machine Learning

Lei Yu Fuji Ren

Graduate School of Advanced Science Technology Education, The University of Tokushima 2-1 Minamijosa Graduate School of Advanced Science Technology Education, The University of Tokushima 2-1 Minamijosa

国际会议

International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

大连

英文

1-7

2009-09-24(万方平台首次上网日期,不代表论文的发表时间)