Voice Conversion with a Strategy for Separating Speaker Individuality Using State-Space Model
It is well known that the key to voice conversion (VC) is to transform the spectral parameters of the source speaker to match that of the target speaker, where Gaussian mixture model (GMM) based statistical transformations have been commonly studied. However, these methods are performed using a frameby- frame procedure, disregarding spectral envelope evolution and resulting in the significantly degraded quality of the converted speech. In this paper, we propose a new voice conversion method using the state-space model (SSM) that can essentially describe the feature of dynamics between frames. Then, physical meaning of SSM for voice conversion has been examined, leading to the novel SSM-based training and transforming procedures. Experiments using both objective and subjective measurements show that the proposed SSM-based method significantly outperforms the traditional GMM-based technique.
Spectral envelope evolution state-space model voice conversion
Ning Xu Zhen Yang Haiyan Guo
Institute of Signal Processing and Transmission of Nanjing University of Posts and Telecommunications Nanjing, China
国际会议
北京
英文
1-4
2010-06-25(万方平台首次上网日期,不代表论文的发表时间)