A Chinese text corrector based on seq2seq model

摘要：

　　In this paper,we build a Chinese text corrector which can correct spelling mistakes precisely in Chinese texts.Our motivation is inspired by the recently proposed seq2seq model which consider the text corrector as a sequence learning problem.To begin with,we propose a biased-decoding method to improve the bilingual evaluation understudy(BLEU)score of our model.Secondly,we adopt a more reasonable OOV token scheme,which enhances the robustness of our correction mechanism.Moreover,to test the performance of our proposed model thoroughly,we establish a corpus which includes 600,000 sentences from news data of Sogou Labs.Experiments show that our corrector model can achieve better corrector results based on the corpus.

关键词： natural language processing Chinese text corrector seq2seq model biased-decoding

作者: Sunyan Gu Fei Lang

作者单位: College of Automation Nanjing University of Posts and Telecommunications Nanjing,China College of Telecommunications and Information Engineering Nanjing University of Posts and Telecommun

会议类型: 国际会议

会议名称: 第九届网络分布式计算与知识发现国际会议( 2017 International Conference on Cyber-enabled distributed computing and knowledge discovery)

会议地点: 南京

会议语种:英文

页码: 322-325

在线出版日期: 2017-10-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Chinese text corrector based on seq2seq model