Frequency Warping for Speaker Adaption of Text-to-speech Synthesis

摘要：

Vocal tract length normalization (VTLN) is generally used in speech recognition for removing individual speaker characteristics. In this paper, we employ VTLN to speaker adaptation of speech synthesis. We propose a new frequency warping approach to reduce the spectrum distance between source and target speakers. The frequency warping function is based on a bilinear function and the warping factor is dynamically generated frame-by-frame. The warped spectra of source speaker are then converted to LSPs to train hidden Markov models (HMM). HMMs are further adapted by maximum likelihood linear regression (MLLR) with target speakers data. The experimental results show that our frequency warping approach can make the warped spectra of source speaker closer to target speaker and the resultant adapted HMMs have a better performance than the HMMs trained with unwarped spectra in term of voice naturalness and speaker similarity.

关键词： frequency warping speaker adaptation TTS

作者: Weixun Gao Qiying Cao

作者单位: School of Information Science and Technology, Donghua Univeristy, Shanghai, China Shanghai Normal Un College of Computer Science & Technology, Donghua Univeristy, Shanghai, China

会议类型: 国际会议

会议名称: 2010 The IET 3rd International Conference on Wireless,Mobile & Multimedia Networks(第三届IET无线移动及多媒体网络国际会议 ICWMMN 2010)

会议地点: 北京

会议语种:英文

页码: 307-310

在线出版日期: 2010-09-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Frequency Warping for Speaker Adaption of Text-to-speech Synthesis