Photo-Realistic Mouth Animation Based on an Asynchronous Articulatory DBN Model for Continuous Speech

摘要：

This paper proposes a continuous speech driven photo realistic visual speech synthesis approach based on an articulatory dynamic Bayesian network model (AF_AVDBN) with constrained asynchrony. In the training of the AF_AVDBN model, the perceptual linear prediction (PLP) features and YUV features are extracted as acoustic and visual features respectively. Given an input speech and the trained AF_AVDBN parameters, an EM-based algorithm is deduced to learn the optimal YUV features, which are then used, together with the compensated high frequency components, to synthesize the mouth animation corresponding to the input speech. In the experiments, mouth animations are synthesized for 80 connected digit speech sentences. Both qualitative and quantitative evaluation results show that the proposed method is capable of synthesizing more natural, clear and accurate mouth animations than those from the state asynchronous DBN model (S_A_DBN).

作者: He Zhang Dongmei Jiang Peng Wu Hichem Sahli

作者单位: VUB-NPU Joint Research Group on Audio Visual Signal Processing (AVSP) Northwestern Polytechnic Unive VUB-NPU Joint Research Group on Audio Visual Signal Processing (AVSP) Vrije Universiteit Brussel (VU

会议类型: 国际会议

会议名称: 2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

会议地点: 西安

会议语种:英文

页码: 1-4

在线出版日期: 2011-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Photo-Realistic Mouth Animation Based on an Asynchronous Articulatory DBN Model for Continuous Speech