Coping with Problems of Unicoded Traditional Mongolian
Traditional Mongolian Unicode Encoding has serious problems as several pairs of vowels with the same glyphs but different pronunciations are coded differently.We expose the severity of the problem by examples from our Mongolian corpus and propose two ways to alleviate the problem: first,developing a publicly available Mongolian input method that can help users to choose the correct encoding and second,a normalization method to solve the data sparseness problems caused by the proliferation of homographs.Experiments in search engines and statistical machine translation show that our methods are effective.
Traditional Mongolian Script Homographs Input Method Normalization
Boli Wang Xiaodong Shi Yidong Chen
Department of Cognitive Science,Xiamen University,Xiamen,China Department of Cognitive Science,Xiamen University,Xiamen,China;Collaborative Innovation Center for P
国内会议
第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD-2016)
烟台
英文
1-8
2016-10-14(万方平台首次上网日期,不代表论文的发表时间)