A MORPHEME-BASED LEXICAL CHUNKING SYSTEM FOR CHINESE

摘要：

Chinese lexical analysis consists of word segmentation and part-of-speech tagging. Most previous studies consider them as two separate tasks. In this paper we formalize the two processes as a unique chunking task on a sequence of morphemes and present an integrated lexical analysis system for Chinese based on lexicalized hidden Markov models. In this way, both contextual lexical information and word-internal morphological features can be statistically explored and further combined for disambiguation and unknown word resolution. Experimental results show that the proposed system outperforms several baselines, illustrating the benefits of the unified lexical chunking method with morphemes as the basic units.

关键词： Chinese lezical analysis Lezical chunking Word segmentation Part-of-speech tagging

作者: GUO-HONG FU CHUN-YU KIT JONATHAN J.WEBSTER

作者单位: School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong

会议类型: 国际会议

会议名称: 2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

会议地点: 昆明

会议语种:英文

页码: 2455-2460

在线出版日期: 2008-07-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A MORPHEME-BASED LEXICAL CHUNKING SYSTEM FOR CHINESE