Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
To address the overall-micro modeling issue of current prosody model in HMM-based speech synthesis, a hierarchical F0 modeling method has been proposed, in which different kinds of pittch patterns are characterized by different prosodic layers and an minimum generation error (MGE) training framework is used to simultaneous optimize F0 models of all layers. This paper investigate the importance of prosodic layers and relationship between prosodic characteristics by this hierarchical F0 modeling method. Cluster number of each layer is modified to balance the accuracy and robustness of each layer, and thus other layers would be influenced due to the additive structure. The importance and relationship are reflected by different systems with different cluster number ratios. The experimental results and conclusion are valuable and helpful to design a hierarchical F0 modeling system.
speech synthesis hidden Markov model hierarchical F0 modeling minimum generation error training
Ming Lei Yi-Jian Wu Zhen-Hua Ling Li-Rong Dai
iFLYTEK Speech Lab, University of Science and Technology of China, Hefei, China Microsoft China, Beijing, China
国际会议
2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)
北京
英文
613-616
2010-08-24(万方平台首次上网日期,不代表论文的发表时间)