会议专题

SINGLE STREAM DBN MODEL FOR CONTINUOUS SPEECH RECOGNITION AND PHONE SEGMENTATION

HMM based models achieve promising performance in speech recognition. But for audiovisual bimodal speech recognition and phone segmentation, HMM based models,such as MSHMM and PHMM, lose in describing the asynchrony between audio and visual streams. Dynamic Bayesian Network (DBN) is expected to model the asynchrony well due to its flexible structure. As the first step,in this paper, we describe a single stream DBN model for continuous speech recognition and phone time segmentation.Word recognition rates and phone segmentation accuracies for 600 testing sentences are compared with those from HMM based speech recognition system (implemented by HMM toolkit HTK), Experiment results show that in clean and high SNRs environments, both DBN and HMM based models achieve similar performance, but in low SNRs environments, DBN has a better performance than HMM. At the same time, Phone segmentation is achieved by GMTK.These provide a foundation for using DBN based models in audiovisual bimodal speech recognition and asynchronous phone segmentation.

Speech Recognition HMM Dynamic Bayesian Network GMTK Phone Segmentation

Guoyun Lv Dongmei Jiang Pengjuan Guo Ali Sun Rongchun Zhao H.Sahli W.Verhelst

Audio Visual Signal Processing Laboratory- AVSP: Northwestern Polytechnical University, School of Co Vrije Universiteit Brussel, Dept.ETRO, Brussels, Belgium

国际会议

2006 International Symposium on Distributed Computing and Applications to Business,Engineering and Science(2006年国际电子、工程及科学领域的分布式计算应用学术研讨会)

杭州

英文

277-280

2006-10-12(万方平台首次上网日期,不代表论文的发表时间)