会议专题

Integrate Statistical Model and Lexical Knowledge for Chinese Multiword Chunking

Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We also apply outside knowledge from a large-scale lexical relationship knowledge base to improve parsing performance. By integrating all above techniques, we develop a new Chinese MWC parser. Experimental results show its parsing performance can greatly exceed the rule-based parser trained and tested in the same data set.

Multiword chunking Relation tagging scheme Outside lexical knowledge base Partial parsing

Qiang Zhou Hang Yu

Centre for Speech and Language Technologies Division of TechInnovation and Dev.Tsinghua National Lab Centre for Speech and Language Technologies Division of Tech.Innovation and Dev.Tsinghua National La

国际会议

The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering(IEEE NLP-KE 2008)(2008IEEE自然语言处理与知识工程国际会议)

北京

英文

2008-10-19(万方平台首次上网日期,不代表论文的发表时间)