Cross-lingual speaker adaptation for HMM-based speech synthesis considering differences between language-dependent average voices

摘要：

This paper proposes an improved cross-lingual speaker adaptation technique with considering the differences between language-dependent average voices in a Speech-to-Speech Translation system. A state mapping based method had been introduced for cross-lingual speaker adaptation in HMMbased speech synthesis. In this method, the transforms estimated from the input language are applied to average voice models of the output language according to the state mapping information. However, the differences between average voices in the input and output language may degrade the adaptation performance. To reduce the differences, we apply a global linear transform to output average voice models, which minimizes the symmetric Kullback-Leibler divergence between two average voice models. From the experimental results, our approach could not obtain a better result than the original state mapping based method. This is because the global transform affects not only speaker characteristics but also language identity in acoustic features, and this degrades the synthetic speech quality. Therefore, it becomes clear that a technique which separate speaker and language identities is required.

关键词： HMM speech synthesis cross-lingual speaker adaptation average voice

作者: Xianglin Peng Keiichiro Oura Yoshihiko Nankaku Keiichi Tokuda

作者单位: Department of Computer Science, Nagoya Institute of Technology, Nagoya, Japan

会议类型: 国际会议

会议名称: 2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)

会议地点: 北京

会议语种:英文

页码: 605-608

在线出版日期: 2010-08-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Cross-lingual speaker adaptation for HMM-based speech synthesis considering differences between language-dependent average voices