Voice Conversion without Parallel Speech Corpus Based on Mixtures of Linear Transform

摘要：

This paper presents an algorithm for voice conversion based on mixtures of linear transform (Ms-LT) which avoids the need for parallel training data inherent in conventional approaches. In maximum likelihood framework, the EM algorithm is used to compute the parameters of the conversion function. And the chirp z-transform is utilized to enhance the averaged spectral envelop due to the linear weighting. The proposed voice conversion system is evaluated using both objective and subjective measures. The experimental results demonstrate that our approach is capable of effectively transforming speaker identity and can achieve comparable results of the conventional methods where a parallel corpus exists.

关键词： Voice conversion multimedia application Ms-LT EM algorithm

作者: Zhi-Hua Jian Zhen Yang

作者单位: Institute of Signal Processing and Transmission, Nanjing University of Post and Telecommunication Nanjing, China

会议类型: 国际会议

会议名称: 第三届IEEE无线通讯、网络技术暨移动计算国际会议

会议地点: 上海

会议语种:英文

在线出版日期: 2007-09-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Voice Conversion without Parallel Speech Corpus Based on Mixtures of Linear Transform