SINGLE STREAM DBN MODEL FOR CONTINUOUS SPEECH RECOGNITION AND PHONE SEGMENTATION

摘要：

HMM based models achieve promising performance in speech recognition. But for audiovisual bimodal speech recognition and phone segmentation, HMM based models,such as MSHMM and PHMM, lose in describing the asynchrony between audio and visual streams. Dynamic Bayesian Network (DBN) is expected to model the asynchrony well due to its flexible structure. As the first step,in this paper, we describe a single stream DBN model for continuous speech recognition and phone time segmentation.Word recognition rates and phone segmentation accuracies for 600 testing sentences are compared with those from HMM based speech recognition system (implemented by HMM toolkit HTK), Experiment results show that in clean and high SNRs environments, both DBN and HMM based models achieve similar performance, but in low SNRs environments, DBN has a better performance than HMM. At the same time, Phone segmentation is achieved by GMTK.These provide a foundation for using DBN based models in audiovisual bimodal speech recognition and asynchronous phone segmentation.

关键词： Speech Recognition HMM Dynamic Bayesian Network GMTK Phone Segmentation

作者: Guoyun Lv Dongmei Jiang Pengjuan Guo Ali Sun Rongchun Zhao H.Sahli W.Verhelst

作者单位: Audio Visual Signal Processing Laboratory- AVSP: Northwestern Polytechnical University, School of Co Vrije Universiteit Brussel, Dept.ETRO, Brussels, Belgium

会议类型: 国际会议

会议名称: 2006 International Symposium on Distributed Computing and Applications to Business,Engineering and Science(2006年国际电子、工程及科学领域的分布式计算应用学术研讨会)

会议地点: 杭州

会议语种:英文

页码: 277-280

在线出版日期: 2006-10-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

SINGLE STREAM DBN MODEL FOR CONTINUOUS SPEECH RECOGNITION AND PHONE SEGMENTATION