Speech-to-Visual Speech Synthesis Using Chinese Visual Triphone

摘要：

A visual speech synthesis approach using Chinese visual triphone is presented. According to Mandarin Chinese pronunciation principle and the relationship between phoneme and viseme, Chinese visual triphone model is constructed. Triphone hidden Markov model is established based on visual triphones. Joint features composed of visual features and audio features are used in the training stage. In the synthesis stage, sentence HMM is constructed by concatenating triphone HMMs. With the features extracted from sentence HMM, visual speech is synthesized. From the scores of subjective and objective estimation, the synthesized video is realistic and satisfactory.

关键词： visual speech synthesis Chinese visual triphone hidden Markov model (HMM) joint features

作者: Hui ZHAO Yamin SHEN Chaojing TANG

作者单位: College of Electronic Science and Engineering National University of Defense Technology Changsha, China

会议类型: 国际会议

会议名称: The 2nd IEEE International Conference on Advanced Computer Control(第二届先进计算机控制国际会议 ICACC 2010)

会议地点: 沈阳

会议语种:英文

页码: 241-245

在线出版日期: 2010-03-27（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Speech-to-Visual Speech Synthesis Using Chinese Visual Triphone