Discriminative Language Model With Part-of-speech for Mandarin Large Vocabulary Continuous Speech Recognition System

摘要：

　　Statistical language model,trained by a large number of text corpus,is an integral componcnt in many speech and natural language model processing systems,such as speech recognition and machine translation.It is a probabilistic model which describes the distribution pattern of natural language.Over the last few decades,N-gram language model (LM) is the most popular technique since it is simple and effective.However,the training of the N-gram language model is based on the maximum likelihood rule resulting in suboptimal output in speech recognition systems.In this paper,a discriminative training based language model (DLM) which directly focused on minimizing speech recognition word error rate (NER) was employed to improve the performance of speech recognition system.In particular,the part-of-speech (POS) feature was used to train DLM as well as the n-gram features.Experimental results showed that DLM with n-gram features gave 1% absolute reduction in word error rate (WER).Combining n-gram features with POS feature,DLM could obtain another 0.4% absolute reduction in WER.

关键词： speech recognition language model DLM POS

作者: Yujing Si Zhen Zhang Qingqing Zhang Jielin Pan Yonghong Yan

作者单位: The Key Laboratory of Speech Acoustics and Content Understanding,Chinese Academy of Sciences,Beijing 100190,P.R.China

会议类型: 国际会议

会议名称: 2013 2nd International Conference on Computer Science and Electronics Engineering(ICCSEE2013)(2013年第二届计算机科学与电子工程国际会议)

会议地点: 杭州

会议语种:英文

页码: 970-973

在线出版日期: 2013-03-22（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Discriminative Language Model With Part-of-speech for Mandarin Large Vocabulary Continuous Speech Recognition System