会议专题

A MORPHEME-BASED LEXICAL CHUNKING SYSTEM FOR CHINESE

Chinese lexical analysis consists of word segmentation and part-of-speech tagging. Most previous studies consider them as two separate tasks. In this paper we formalize the two processes as a unique chunking task on a sequence of morphemes and present an integrated lexical analysis system for Chinese based on lexicalized hidden Markov models. In this way, both contextual lexical information and word-internal morphological features can be statistically explored and further combined for disambiguation and unknown word resolution. Experimental results show that the proposed system outperforms several baselines, illustrating the benefits of the unified lexical chunking method with morphemes as the basic units.

Chinese lezical analysis Lezical chunking Word segmentation Part-of-speech tagging

GUO-HONG FU CHUN-YU KIT JONATHAN J.WEBSTER

School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

2455-2460

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)