会议专题

Fast Speaker Normalization and Adaptation based on BIC for Meeting Speech Recognition

This paper presents a unified method for speech segmentation, speaker normalization of spectral features, and speaker adaptation of acoustic model for efficient meeting speech recognition. In the proposed method, input speech is segmented based on BIC (Bayesian Information Criterion), and compared against each speaker’s statistic in the training corpus of the acoustic model based on the BIC. Fast VTLN (Vocal Tract Length Normalization) and MLLR (Maximum Likelihood Linear Regression) adaptation are realized using a pre-estimated warping factor and MLLR transformation matrices of the best-matched speakers, respectively. Experimental evaluations in Parliamentary speech transcription demonstrated that the proposed method achieved comparable ASR accuracy to the standard ML estimation for both VTLN and MLLR adaptation, with significant reduction of processing time.

Masato Mimura Tatsuya Kawahara

Kyoto University, Academic Center for Computing and Media Studies, Sakyo-ku, Kyoto 606-8501, Japan Kyoto University, Academic Center for Computing and Media Studies,Sakyo-ku, Kyoto 606-8501, Japan

国际会议

2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

西安

英文

1-4

2011-10-18(万方平台首次上网日期,不代表论文的发表时间)