A New Method for F0 Tracking Errors Fix and Generation in HMM-based Mandarin Speech Synthesis using Generation Process Model

摘要：

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (VU) decisions are the two key factors in voice quality problems. Also these errors will enlarge the RMSE of phoneme duration. In HMM-based TTS durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper, an F0 generation process model is used to re-estimate F0 values in the regions of pitch tracking errors, as well as in unvoiced regions. A prior knowledge of VU is imposed in each Mandarin phoneme and they are used for VU decision. Also we design two sets of syntax features to improve Mandarin phone and pause duration prediction respectively.

关键词： Mandarin speech synthesis F0 generation VU error fix HMM-based speech synthesis generation process model

作者: Miaomiao Wang Miaomiao Wen Keikichi Hirose Nobuaki Minematsu

作者单位: Graduate School of Engineering, The University of Tokyo, Japan

会议类型: 国际会议

会议名称: 2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)

会议地点: 北京

会议语种:英文

页码: 609-612

在线出版日期: 2010-08-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A New Method for F0 Tracking Errors Fix and Generation in HMM-based Mandarin Speech Synthesis using Generation Process Model