会议专题

Combining Information from Multi-Stream Features Using Deep Neural Network in Speech Recognition

  The subject of the paper is the integration of multi-stream features in the framework of hybrid artificial neural network (ANN)-hidden Markov model (HMM).We investigate the use of log filter bank and MFCC features in multi-stream combination for phoneme recognition.An intermediate integration method is proposed to fuse the information from different sets of features.By exploiting deep learning algorithm to train the deep neural network (DNN),we explore different stream combination methods.Results of recognition experiments using DNN-HMM system on the TIMIT speech data show that the proposed approach is not only superior to the single best stream,which is relative 6.1% phone error rate (PER) reduction,but outperforms the other fusion strategies as well.

multi-stream combination deep learning phoneme recognition DNN-HMM intermediate integration

Pan Zhou Lirong Dai Qingfeng Liu Hui Jiang

Department of Electronic Engineering and Information Science, University of Science and Technology o Anhui USTC iFLYTEK Corporation, Limited, Hefei, China Department of Computer Science and Engineering, York University, Toronto, Canada

国际会议

2012 IEEE 11th International Conference on Signal Processing (第11届IEEE信号处理国际会议)

北京

英文

557-561

2012-10-21(万方平台首次上网日期,不代表论文的发表时间)