Data Pre-processing in Emotional Speech Synthesis by Emotion Recognition

摘要：

Synthesizing emotional speech by means of conversion from neutral speech allows us to generate emotional speech from many existing Text-to-Speech (TTS) systems. How much of the target emotion can be portrayed by the generated speech is largely dependent on the emotion data used to train the mapping function for voice transformation. In this paper, we introduce a method to pre-process the emotion database by detecting the emotions from speech using machine learning methods. A selection criterion is proposed to yield a refined database based on the results from emotion recognition. The experimental results have shown that the proposed data preprocessing method can effectively improve the naturalness of the synthesized speech by better portraying the targeted emotion. The quality of speech synthesized using the smaller database is comparable to that using the whole database. The computational load is reduced due to the reduction of the data used for training the transformation model.

作者: Ling CEN Minghui DONG Paul CHAN

作者单位: Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632

会议类型: 国际会议

会议名称: 2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

会议地点: 西安

会议语种:英文

页码: 1-4

在线出版日期: 2011-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Data Pre-processing in Emotional Speech Synthesis by Emotion Recognition