A Study on Cross-Language Tezt Summarization Using Supervised Methods

摘要：

In this work, we use Hidden Markov Models (HMM), Conditional Random Field (CRF), Gaussian Mixture Models (GMM) and Mathematical Methods of Statistics (MMS) for Chinese and Japanese text summarization. The purpose of this work is to study the applicability of mentioned three trainable models for cross-language text summarization. For model training, we use several training features such as sentence position, sentence centrality, number of Name Entity and so on. For model testing, Chinese on-line news and Japanese news are used as test data which are extracted from web pages. We evaluate each model by measuring the precision at the compression rate 10%, 20% and 30%. MMS is a baseline method. The results show that HMM, CRF and GMM have remarkable increases than MMS on both Chinese and Japanese text summarization by using the same training features. Especially, GMM model make a best performance in all tests.

关键词： Tezt Summarization NLP Machine Learning

作者: Lei Yu Fuji Ren

作者单位: Graduate School of Advanced Science Technology Education, The University of Tokushima 2-1 Minamijosa Graduate School of Advanced Science Technology Education, The University of Tokushima 2-1 Minamijosa

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-7

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Study on Cross-Language Tezt Summarization Using Supervised Methods