Multi-view LSTM Language Model with Word-synchronized Auxiliary Feature for LVCSR
Recently long short-term memory language model(LSTMLM)has received tremendous interests from both language and speech communities,due to its superiorty on modelling long-term dependency.Moreover,integrating auxiliary information,such as context feature,into the LSTM LM has shown improved performance in perplexity(PPL).However,improper feed of auxiliary information won”t give consistent gain on word error rate(WER)in a large vocabulary continuous speech recognition(LVCSR)task.To solve this problem,a multi-view LSTM LM architecture combining a tagging model is proposed in this paper.Firstly an on-line unidirectional LSTM-RNN is built as a tagging model,which can generate word-synchronized auxiliary feature.Then the auxiliary feature from the tagging model is combined with the word sequence to train a multi-view unidirectional LSTM LM.Different training modes for the tagging model and language model are explored and compared.The new architecture is evaluated on PTB,Fisher English and SMS Chinese data sets,and the results show that not only LM PPL promotion is observed,but also the improvements can be well transferred to WER reduction in ASR-rescore task.
LSTM language model speech recognition multi-view aux-iliary feature tagging model
Yue Wu Tianxing He Zhehuai Chen Yanmin Qian Kai Yu
Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,Department of Computer Science and Engineering Brain Science and Technology Research Center Shanghai Jiao Tong University,Shanghai,China
国内会议
第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会
南京
英文
1-13
2017-10-13(万方平台首次上网日期,不代表论文的发表时间)