会议专题

Voice Conversion with a Strategy for Separating Speaker Individuality Using State-Space Model

It is well known that the key to voice conversion (VC) is to transform the spectral parameters of the source speaker to match that of the target speaker, where Gaussian mixture model (GMM) based statistical transformations have been commonly studied. However, these methods are performed using a frameby- frame procedure, disregarding spectral envelope evolution and resulting in the significantly degraded quality of the converted speech. In this paper, we propose a new voice conversion method using the state-space model (SSM) that can essentially describe the feature of dynamics between frames. Then, physical meaning of SSM for voice conversion has been examined, leading to the novel SSM-based training and transforming procedures. Experiments using both objective and subjective measurements show that the proposed SSM-based method significantly outperforms the traditional GMM-based technique.

Spectral envelope evolution state-space model voice conversion

Ning Xu Zhen Yang Haiyan Guo

Institute of Signal Processing and Transmission of Nanjing University of Posts and Telecommunications Nanjing, China

国际会议

2010 IEEE International Conference Conferenhce on Wireless Communications,Networking and Information Security(2010 IEEE 无线通信、网络技术与信息安全国际会议 WCNIS)

北京

英文

1-4

2010-06-25(万方平台首次上网日期,不代表论文的发表时间)