Language Model for Mongolian Polyphone Proofreading

摘要：

　　Mongolian text proofreading is the particularly difficult task because of its unique polyphonic alphabet,morphological ambiguity and agglutinative feature,and coding errors are currently pervasive in the Mongolian corpus of electronic edition,which results in Mongolian statistic and retrieval research toughly difficult to carry out.Some conventional approaches have been pro-posed to solve this problem but with limitations by not considering proofread-ing of polyphone.In this paper,we address this problem by means of construct-ing the large-scale resource and conducting n-gram language model based ap-proach.For ease of understanding,the entire proofreading system architecture is also introduced in this paper,since the polyphone proofreading is the im-portant component of it.Experimental results show that our method performs pretty well.Polyphone correction accuracy is relatively improved by 62%and overall system accuracy is relatively promoted by 16.1%.

关键词： Mongolian Polyphone Automatic Proofreading System Morpho-logical Ambiguity

作者: Min Lu Feilong Bao Guanglai Gao

作者单位: College of Computer Science,Inner Mongolia University,Hohhot 010021,China

会议类型: 国内会议

会议名称: 第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会

会议地点: 南京

会议语种:英文

页码: 1-11

在线出版日期: 2017-10-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Language Model for Mongolian Polyphone Proofreading