Fast Speaker Normalization and Adaptation based on BIC for Meeting Speech Recognition

摘要：

This paper presents a unified method for speech segmentation, speaker normalization of spectral features, and speaker adaptation of acoustic model for efficient meeting speech recognition. In the proposed method, input speech is segmented based on BIC (Bayesian Information Criterion), and compared against each speaker’s statistic in the training corpus of the acoustic model based on the BIC. Fast VTLN (Vocal Tract Length Normalization) and MLLR (Maximum Likelihood Linear Regression) adaptation are realized using a pre-estimated warping factor and MLLR transformation matrices of the best-matched speakers, respectively. Experimental evaluations in Parliamentary speech transcription demonstrated that the proposed method achieved comparable ASR accuracy to the standard ML estimation for both VTLN and MLLR adaptation, with significant reduction of processing time.

作者: Masato Mimura Tatsuya Kawahara

作者单位: Kyoto University, Academic Center for Computing and Media Studies, Sakyo-ku, Kyoto 606-8501, Japan Kyoto University, Academic Center for Computing and Media Studies,Sakyo-ku, Kyoto 606-8501, Japan

会议类型: 国际会议

会议名称: 2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

会议地点: 西安

会议语种:英文

页码: 1-4

在线出版日期: 2011-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Fast Speaker Normalization and Adaptation based on BIC for Meeting Speech Recognition