会议专题

Data Pre-processing in Emotional Speech Synthesis by Emotion Recognition

Synthesizing emotional speech by means of conversion from neutral speech allows us to generate emotional speech from many existing Text-to-Speech (TTS) systems. How much of the target emotion can be portrayed by the generated speech is largely dependent on the emotion data used to train the mapping function for voice transformation. In this paper, we introduce a method to pre-process the emotion database by detecting the emotions from speech using machine learning methods. A selection criterion is proposed to yield a refined database based on the results from emotion recognition. The experimental results have shown that the proposed data preprocessing method can effectively improve the naturalness of the synthesized speech by better portraying the targeted emotion. The quality of speech synthesized using the smaller database is comparable to that using the whole database. The computational load is reduced due to the reduction of the data used for training the transformation model.

Ling CEN Minghui DONG Paul CHAN

Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632

国际会议

2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

西安

英文

1-4

2011-10-18(万方平台首次上网日期,不代表论文的发表时间)