Combining Information from Multi-Stream Features Using Deep Neural Network in Speech Recognition

摘要：

　　The subject of the paper is the integration of multi-stream features in the framework of hybrid artificial neural network (ANN)-hidden Markov model (HMM).We investigate the use of log filter bank and MFCC features in multi-stream combination for phoneme recognition.An intermediate integration method is proposed to fuse the information from different sets of features.By exploiting deep learning algorithm to train the deep neural network (DNN),we explore different stream combination methods.Results of recognition experiments using DNN-HMM system on the TIMIT speech data show that the proposed approach is not only superior to the single best stream,which is relative 6.1% phone error rate (PER) reduction,but outperforms the other fusion strategies as well.

关键词： multi-stream combination deep learning phoneme recognition DNN-HMM intermediate integration

作者: Pan Zhou Lirong Dai Qingfeng Liu Hui Jiang

作者单位: Department of Electronic Engineering and Information Science, University of Science and Technology o Anhui USTC iFLYTEK Corporation, Limited, Hefei, China Department of Computer Science and Engineering, York University, Toronto, Canada

会议类型: 国际会议

会议名称: 2012 IEEE 11th International Conference on Signal Processing (第11届IEEE信号处理国际会议)

会议地点: 北京

会议语种:英文

页码: 557-561

在线出版日期: 2012-10-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Combining Information from Multi-Stream Features Using Deep Neural Network in Speech Recognition