会议专题

A New Method for F0 Tracking Errors Fix and Generation in HMM-based Mandarin Speech Synthesis using Generation Process Model

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (VU) decisions are the two key factors in voice quality problems. Also these errors will enlarge the RMSE of phoneme duration. In HMM-based TTS durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper, an F0 generation process model is used to re-estimate F0 values in the regions of pitch tracking errors, as well as in unvoiced regions. A prior knowledge of VU is imposed in each Mandarin phoneme and they are used for VU decision. Also we design two sets of syntax features to improve Mandarin phone and pause duration prediction respectively.

Mandarin speech synthesis F0 generation VU error fix HMM-based speech synthesis generation process model

Miaomiao Wang Miaomiao Wen Keikichi Hirose Nobuaki Minematsu

Graduate School of Engineering, The University of Tokyo, Japan

国际会议

2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)

北京

英文

609-612

2010-08-24(万方平台首次上网日期,不代表论文的发表时间)