A MORPHEME-BASED LEXICAL CHUNKING SYSTEM FOR CHINESE
Chinese lexical analysis consists of word segmentation and part-of-speech tagging. Most previous studies consider them as two separate tasks. In this paper we formalize the two processes as a unique chunking task on a sequence of morphemes and present an integrated lexical analysis system for Chinese based on lexicalized hidden Markov models. In this way, both contextual lexical information and word-internal morphological features can be statistically explored and further combined for disambiguation and unknown word resolution. Experimental results show that the proposed system outperforms several baselines, illustrating the benefits of the unified lexical chunking method with morphemes as the basic units.
Chinese lezical analysis Lezical chunking Word segmentation Part-of-speech tagging
GUO-HONG FU CHUN-YU KIT JONATHAN J.WEBSTER
School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
国际会议
2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)
昆明
英文
2455-2460
2008-07-12(万方平台首次上网日期,不代表论文的发表时间)