Rapid F0 estimation for high-SNR speech
Fundamental frequency (F0) is one of the most important features of voice and indispensable for high-quality voice synthesis. Many F0 estimation methods for various purposes have been proposed to improve the estimation performance. For example, a F0 estimation method has been proposed to improve the estimation performance of speech under noisy environments. However, voice used for high-quality synthesis usually has higher SNR. Furthermore, real-time voice synthesis requires an online F0 estimation method. Therefore, a rapid and reliable F0 estimation method for high-SNR speech is crucial for real-time and high-quality voice synthesis. We have proposed a F0 estimation method to fulfill these requirements. Our proposed method consists of three steps. First step is the fundamental components extraction by using many low-pass filters. Second step is calculation of a new criterion named fundamentalness. Final step is selection of the optimum F0 by using fundamentalness. In previous study, fundamentalness was defined as the variance based on negative and positive going zero-crossing intervals and the intervals between successive peaks and dips. The variance of four intervals is zero if the filtered signal only consists of the fundamental component. In this paper, we propose a new criterion to improve temporal resolution of analysis. As the period of a sine wave can be divided into four equal intervals based on four zero-crossing points, the variance of these intervals is equal one as well as the previous criterion. As a result of evaluation experiments, we confirmed that the proposed method can estimate the reliable F0 in online.
Masanori Morise Hideki Kawahara Takanobu Nishiura
College of Information Science and Engineering,Ritsumeikan University,1-1-1,Nojihigashi,Kusatsu,Shig Faculty of Systems Engineering,Wakayama University,930,Sakaedani,Wakayama,Japan
国际会议
The 10th Western Pacific Acoustics Conference(第十届西太平洋声学会议WESPAC X)
北京
英文
1-6
2009-09-21(万方平台首次上网日期,不代表论文的发表时间)