Lexicon Optimization for Automatic Speech Recognitionbased on Discriminative Learning
In agglutinative languages such as Japanese and Uyghur, selection of lexical unit is not obvious and one of the important issues in designing language model for automatic speech recognition (ASR). In this paper, we propose a discriminative learning method to select word entries which would reduce the word error rate (WER). We define an evaluation function for each word by a set of features and their weights, and the measure for optimization by the difference of WERs by the two units (morpheme and word). Then, the weights of the features are learned by a perceptron algorithm. Finally, word entries with higher evaluation scores are selected. The discriminative method is successfully applied to an Uyghur large-vocabulary continuous speech recognition system, resulting in a significant reduction of WER without a drastic increase of the vocabulary size.
Mijit Ablimit Tatsuya Kawahara Askar Hamdulla
School of Informatics, Kyoto University, Kyoto, Japan Institute of Information Engineering, Xinjiang University, Urumqi
国际会议
2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)
西安
英文
1-4
2011-10-18(万方平台首次上网日期,不代表论文的发表时间)