会议专题

Experimental Study of Structure to Speech Conversion -- An implementation of Infant-like Vocal Imitation on a Machine

Most of the speech synthesizers have been developed as text (phoneme sequence) to speech converters and,in this framework,text input is a precondition for speech production.However,we can say that no child acquires spoken language by reading a given text out.Children are explained to acquire spoken language by imitating the utterances of their parents but they never imitate the voices of their parents.Developmental psychology claims that they extract a holistic and speakerinvariant sound pattern embedded in a given utterance,called word Gestalt,and realize the pattern acoustically using their short vocal tubes.In our previous studies,we mathematically defined this holistic and speakerinvariant pattern and used it for ASR 1,2,3,4.Here,we experimentally implement its inverse process,i.e.Gestalt-to-utterance conversion,on a computer.

Nobuaki Minematsu Daisuke Saito Keikichi Hirose

The University of Tokyo

国际会议

9th International Conference on Signal Processing(第九届国际信号处理学术会议)(ICSP08)

北京

英文

2008-10-26(万方平台首次上网日期,不代表论文的发表时间)