SINGLE STREAM DBN MODEL FOR CONTINUOUS SPEECH RECOGNITION AND PHONE SEGMENTATION
HMM based models achieve promising performance in speech recognition. But for audiovisual bimodal speech recognition and phone segmentation, HMM based models,such as MSHMM and PHMM, lose in describing the asynchrony between audio and visual streams. Dynamic Bayesian Network (DBN) is expected to model the asynchrony well due to its flexible structure. As the first step,in this paper, we describe a single stream DBN model for continuous speech recognition and phone time segmentation.Word recognition rates and phone segmentation accuracies for 600 testing sentences are compared with those from HMM based speech recognition system (implemented by HMM toolkit HTK), Experiment results show that in clean and high SNRs environments, both DBN and HMM based models achieve similar performance, but in low SNRs environments, DBN has a better performance than HMM. At the same time, Phone segmentation is achieved by GMTK.These provide a foundation for using DBN based models in audiovisual bimodal speech recognition and asynchronous phone segmentation.
Speech Recognition HMM Dynamic Bayesian Network GMTK Phone Segmentation
Guoyun Lv Dongmei Jiang Pengjuan Guo Ali Sun Rongchun Zhao H.Sahli W.Verhelst
Audio Visual Signal Processing Laboratory- AVSP: Northwestern Polytechnical University, School of Co Vrije Universiteit Brussel, Dept.ETRO, Brussels, Belgium
国际会议
杭州
英文
277-280
2006-10-12(万方平台首次上网日期,不代表论文的发表时间)