Segmentation of Speech Signals in Template-based Speech to Singing Conversion

摘要：

Singing voice synthesis has found numerous applications in the entertainment industry over the recent years. The template-based personalized singing voice synthesis method is a new method of generating high quality singing voice, which synthesizes the singing voice by means of conversion from the narrated lyrics of a song. In this synthesis method, template speaking and singing voices are first recorded for the purpose of modeling the transformation from speech to singing. To improve its accuracy while reducing computational load, the template voices are divided into several segments so that fine alignment and subsequent conversion can be performed separately for each segment. To correctly generate singing voice, a new instance of speech has to be divided into similar segments, each containing the same stanza as in the template voices. In order to achieve this, an automatic segmentation method is proposed in this paper. The experiment results have shown that the segmentation of speech signals using our method is comparable to manual segmentation, with an accuracy of 98.24％. This performance is consistent even in the presence of noise.

作者: Ling CEN Minghui DONG Paul CHAN

作者单位: Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore 138632

会议类型: 国际会议

会议名称: 2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

会议地点: 西安

会议语种:英文

页码: 1-4

在线出版日期: 2011-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Segmentation of Speech Signals in Template-based Speech to Singing Conversion