会议专题

Modeling Characteristics of Agglutinative Languages with Multi-Class Language Model for ASR System

In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.

I.Dawa Y.Sagisaka S.Nakamura

国际会议

2009 Oriental COCOSDA International Conference on Speech Database and Assessments(2009 国际语音交互标准数据评估技术大会)

北京

英文

104-109

2009-08-10(万方平台首次上网日期,不代表论文的发表时间)