Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

摘要：

　　Since Chinese dependency parsing is lack of a large amount of manually annotated dependency treebank.Some unsupervised methods of using large-scale unannotated data are proposed and inevitably introduce too much noise from automatic annotation.In order to solve this problem,this paper proposes an approach of iteratively integrating unsupervised features for training Chinese dependency parsing model.Considering that more errors occurred in parsing longer sentences,this paper divide raw data according to sentence length and then iteratively train model.The model trained on shorter sentences will be used in the next iteration to analyze longer sentences.This paper adopts a character-based dependency model for joint word segmentation,POS tagging and dependency parsing in Chinese.The advantage of the joint model is that one task can be promoted by other tasks during processing by exploring the available internal results from the other tasks.The higher accuracy of the three tasks on shorter sentences can bring about higher accuracy of the whole model.This paper verified the proposed approach on the Penn Chinese Treebank and two raw corpora.The experimental results show that F1-scores of the three tasks were improved at each iteration,and F1-score of the dependency parsing was increased by 0.33%,compared with the conventional method.

关键词： Chinese dependency parsing iteration unsupervised learning joint model

作者: Te Luo Yujie Zhang Jinan Xu Yufeng Chen

作者单位: School of Computer and Information Technology Beijing Jiaotong University

会议类型: 国际会议

会议名称: 第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)

会议地点: 昆明

会议语种:英文

页码: 1-10

在线出版日期: 2016-12-02（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing