Integrate Statistical Model and Lexical Knowledge for Chinese Multiword Chunking
Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We also apply outside knowledge from a large-scale lexical relationship knowledge base to improve parsing performance. By integrating all above techniques, we develop a new Chinese MWC parser. Experimental results show its parsing performance can greatly exceed the rule-based parser trained and tested in the same data set.
Multiword chunking Relation tagging scheme Outside lexical knowledge base Partial parsing
Qiang Zhou Hang Yu
Centre for Speech and Language Technologies Division of TechInnovation and Dev.Tsinghua National Lab Centre for Speech and Language Technologies Division of Tech.Innovation and Dev.Tsinghua National La
国际会议
北京
英文
2008-10-19(万方平台首次上网日期,不代表论文的发表时间)