Voice Activity Detection Based on a Sequential Gaussian Mixture Model

摘要：

Voice activity detection (VAD) is a basic component of noise reduction algorithms. In this paper, we propose a voice activity detector based on a sequential Gaussian Mixture Model (SGMM) in log-spectral domain. This model comprises two Gaussian components, which respectively describe the speech and nonspeech log-power distributions. The initial distributions are firstly established by EM algorithm, and then sequentially updated in an on-line manner. From the SGMM, a self-regulatory threshold for discrimination is derived at each subband. The proposed VAD does not rely on an assumption that the first several frames of an utterance are nonspeech, which is widely used in most VADs. Moreover, the speech presence probability in the time-frequency domain is a byproduct of this VAD. We tested it on speech from TIMIT database and noise from NOISEX- 92 database. The evaluations effectively showed its promising performance.

作者: Dongwen Ying Junfeng Li Qiang Fu Yonghong Yan Jianwu Dang

作者单位: The Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences Tianjin University, School of Computer Science and Technology

会议类型: 国际会议

会议名称: 2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

会议地点: 西安

会议语种:英文

页码: 1-6

在线出版日期: 2011-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Voice Activity Detection Based on a Sequential Gaussian Mixture Model