LM Enhanced BiRNN-CRF for Joint Chinese Word Segmentation and POS Tagging
Word segmentation and part-of-speech tagging are two preliminary but fundamental components of Chinese natural language processing.With the upsurge of deep learning,end-to-end models are built without handcrafted features.In this work,we model Chinese word segmentation and part-of-speech tagging jointly on the basis of state-of-theart BiRNN-CRF architecture.LSTM is adopted as the basic recurrent unit.Apart from utilizing pre-trained character embeddings and trigram features,we incorporate neural language model and conduct multi-task training.Highway layers are applied to tackle the discordance issue of the naive co-training.Experimental results on CTB5,CTB7,and PPD datasets show the effectiveness of the proposed method.
Chinese word segmentation POS tagging LSTM Language model
Jianhu Zhang Gongshen Liu Jie Zhou Cheng Zhou Huanrong Sun
School of Electric Information and Electronic Engineering,Shanghai Jiaotong University,Shanghai,Chin SJTU-Shanghai Songheng Content Analysis Joint Lab,Shanghai,China
国际会议
2018自然语言处理与中文计算国际会议(NLPCC2018)
呼和浩特
英文
105-116
2018-08-26(万方平台首次上网日期,不代表论文的发表时间)