Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis

摘要：

To address the overall-micro modeling issue of current prosody model in HMM-based speech synthesis, a hierarchical F0 modeling method has been proposed, in which different kinds of pittch patterns are characterized by different prosodic layers and an minimum generation error (MGE) training framework is used to simultaneous optimize F0 models of all layers. This paper investigate the importance of prosodic layers and relationship between prosodic characteristics by this hierarchical F0 modeling method. Cluster number of each layer is modified to balance the accuracy and robustness of each layer, and thus other layers would be influenced due to the additive structure. The importance and relationship are reflected by different systems with different cluster number ratios. The experimental results and conclusion are valuable and helpful to design a hierarchical F0 modeling system.

关键词： speech synthesis hidden Markov model hierarchical F0 modeling minimum generation error training

作者: Ming Lei Yi-Jian Wu Zhen-Hua Ling Li-Rong Dai

作者单位: iFLYTEK Speech Lab, University of Science and Technology of China, Hefei, China Microsoft China, Beijing, China

会议类型: 国际会议

会议名称: 2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)

会议地点: 北京

会议语种:英文

页码: 613-616

在线出版日期: 2010-08-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis