会议专题

Comparison and Combination of Multilayer Perceptrons and Deep Belief Networks in Hybrid Automatic Speech Recognition Systems

To improve the speech recognition performance, many ways to augment or combine HMMs (Hidden Markov Models) with other models to build hybrid architectures have been proposed. The hybrid HMM/ANN (Hidden Markov Model / Artificial Neural Network) architecture is one of the most successful approaches. In this hybrid model, ANNs (which are often multilayer perceptron neural networks - MLPs) are used as an HMM-state posterior estimator. Recently, Deep Belief Networks (DBNs) were introduced as a newly powerful machine learning technique. Generally, DBNs are MLPs with many hidden layers, however, while weights of MLPs are often initialized randomly, DBNs use a greedy layer-by-layer pretraining algorithm to initialize the network weights. This pretraining initialization step has resulted in successful realizations of DBNs for various applications such as handwriting recognition, 3-D object recognition, dimensionality reduction and automatic speech recognition (ASR) tasks. To evaluate the effectiveness of the pre-initialization steps that characterize DBNs from MLPs for ASR tasks, we conduct a comparative evaluation between the two systems on phone recognition for the TIMIT database. The effectiveness, advantages and computational cost of each method will be investigated and analyzed. We also show that the information generated by DBNs and MLPs are complementary,where a consistent improvement is observed when the two systems are combined. In addition, we investigate the ability of the hybrid HMM/DBN system in the case only a limited amount of labeled training data is available.

Van Hai Do Xiong Xiao Eng Siong Chng

School of Computer Engineering, Nanyang Technological University, Singapore Temasek Laboratories@NTU Temasek Laboratories@NTU, Nanyang Technological University, Singapore

国际会议

2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

西安

英文

1-6

2011-10-18(万方平台首次上网日期,不代表论文的发表时间)