Automatic Long Sentence Segmentation for Neural Machine Translation
Neural machine translation(NMT)is an emerging machine translation paradigm that translates texts with an encoder-decoder neural architecture.Very recent studies find that translation quality drops significantly when NMT translates long sentences.In this paper,we propose a novel method to deal with this issue by segmenting long sentences into several clauses.We introduce a split and reordering model to collectively detect the optimal sequence of segmentation points for a long source sentence.Each segmented clause is translated by the NMT system independently into a target clause.The translated target clauses are then concatenated without reordering to form the final translation for the long sentence.On NIST Chinese-English translation tasks,our segmentation method achieves a substantial improvement of 2.94 BLEU points over the NMT baseline on translating long sentences with more than 30 words,and 5.43 BLEU points on sentences of over 40 words.
Shaohui Kuang Deyi Xiong
Soochow University,Suzhou,China
国际会议
第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)
昆明
英文
1-12
2016-12-02(万方平台首次上网日期,不代表论文的发表时间)