会议专题

Cross-lingual speaker adaptation for HMM-based speech synthesis considering differences between language-dependent average voices

This paper proposes an improved cross-lingual speaker adaptation technique with considering the differences between language-dependent average voices in a Speech-to-Speech Translation system. A state mapping based method had been introduced for cross-lingual speaker adaptation in HMMbased speech synthesis. In this method, the transforms estimated from the input language are applied to average voice models of the output language according to the state mapping information. However, the differences between average voices in the input and output language may degrade the adaptation performance. To reduce the differences, we apply a global linear transform to output average voice models, which minimizes the symmetric Kullback-Leibler divergence between two average voice models. From the experimental results, our approach could not obtain a better result than the original state mapping based method. This is because the global transform affects not only speaker characteristics but also language identity in acoustic features, and this degrades the synthetic speech quality. Therefore, it becomes clear that a technique which separate speaker and language identities is required.

HMM speech synthesis cross-lingual speaker adaptation average voice

Xianglin Peng Keiichiro Oura Yoshihiko Nankaku Keiichi Tokuda

Department of Computer Science, Nagoya Institute of Technology, Nagoya, Japan

国际会议

2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)

北京

英文

605-608

2010-08-24(万方平台首次上网日期,不代表论文的发表时间)