Language Model for Mongolian Polyphone Proofreading
Mongolian text proofreading is the particularly difficult task because of its unique polyphonic alphabet,morphological ambiguity and agglutinative feature,and coding errors are currently pervasive in the Mongolian corpus of electronic edition,which results in Mongolian statistic and retrieval research toughly difficult to carry out.Some conventional approaches have been pro-posed to solve this problem but with limitations by not considering proofread-ing of polyphone.In this paper,we address this problem by means of construct-ing the large-scale resource and conducting n-gram language model based ap-proach.For ease of understanding,the entire proofreading system architecture is also introduced in this paper,since the polyphone proofreading is the im-portant component of it.Experimental results show that our method performs pretty well.Polyphone correction accuracy is relatively improved by 62%and overall system accuracy is relatively promoted by 16.1%.
Mongolian Polyphone Automatic Proofreading System Morpho-logical Ambiguity
Min Lu Feilong Bao Guanglai Gao
College of Computer Science,Inner Mongolia University,Hohhot 010021,China
国内会议
第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会
南京
英文
1-11
2017-10-13(万方平台首次上网日期,不代表论文的发表时间)