Coping with Problems of Unicoded Traditional Mongolian

摘要：

　　Traditional Mongolian Unicode Encoding has serious problems as several pairs of vowels with the same glyphs but different pronunciations are coded differently.We expose the severity of the problem by examples from our Mongolian corpus and propose two ways to alleviate the problem: first,developing a publicly available Mongolian input method that can help users to choose the correct encoding and second,a normalization method to solve the data sparseness problems caused by the proliferation of homographs.Experiments in search engines and statistical machine translation show that our methods are effective.

关键词： Traditional Mongolian Script Homographs Input Method Normalization

作者: Boli Wang Xiaodong Shi Yidong Chen

作者单位: Department of Cognitive Science,Xiamen University,Xiamen,China Department of Cognitive Science,Xiamen University,Xiamen,China;Collaborative Innovation Center for P

会议类型: 国内会议

会议名称: 第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD-2016)

会议地点: 烟台

会议语种:英文

页码: 1-8

在线出版日期: 2016-10-14（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Coping with Problems of Unicoded Traditional Mongolian