会议专题

Sliding Window-based Speech-to-Lips Conversion with Low Delay

The goal of a good speech-to-lips conversion system is to synthesize high quality, realistic lips movement which is time synchronized with the input speech. Previously, the maximum probability estimation of visual trajectory by Gaussian Mixture Model (GMM) has been successfully proposed and tested for speech-to-lips conversion. It works as a sentence level batch process that convert acoustic speech signals to visual lips movement trajectory. In this paper, we propose a moving window based, low delay speech-to-lips conversion method for real-time communication applications. The new approach is an approximation of the MLE-GMM conversion but can render lips movement on-the-fly with a low time latency. Experimental results on the LIPS2009 dataset shows that proposed real-time method can achieve a latency of less than 100ms while maintain comparable quality as the batch method.

Wei Han Lijuan Wang Frank Soong Bo Yuan

Shanghai Jiao Tong University, Shanghai Microsoft Research Asia, Beijing Microsoft Research Asia, Beijing Shanghai Jiao Tong University, Shanghai

国际会议

2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

西安

英文

1-4

2011-10-18(万方平台首次上网日期,不代表论文的发表时间)