Regularization in a reproducing kernel Hilbert space for robust voice activity detection

摘要：

Voice activity detection (VAD) is still a difficult task in noisy environments since the statistical distributions of speech and non-speech features are heavily overlapped in noisy environments. Considering that speech is a special type of acoustic signal that only occupies a small fraction of the whole acoustic space, we have proposed a new speech processing method for VAD by giving constraints on the processing space as a reproducing kernel Hilbert space (RKHS) 1. In the RKHS, the estimation of the speech was regarded as a functional approximation problem. Via a regularization in the RKHS framework, a target function is learned to approximate the speech signal while the noise component is supposed to be smoothed out. In this framework, we could incorporate the nonlinear mapping functions in the approximation implicitly via a kernel function. The approximation function could capture the nonlinear and high-order statistical structure of the speech. Our VAD algorithm is designed on the basis of the power energy in this regularized RKHS. We have tested its performance on CENSREC-1-C data corpus for VAD task 1. In this paper, we quantified its performance on the discriminability for speech and non-speech, and further compared its performance with several classical VAD algorithms. Experimental results showed that the proposed processing for speech enhanced the discriminability between the distributions of speech and non-speech, and got better performance on the VAD task than the classical VAD algorithms.

作者: Xugang Lu Masashi Unoki Ryosuke Isotani Hisashi Kawai Satoshi Nakamura

作者单位: National Institute of Information and Communications Technology, Japan Japan Advanced Institute of Science and Technology, Japan

会议类型: 国际会议

会议名称: 2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)

会议地点: 北京

会议语种:英文

页码: 585-588

在线出版日期: 2010-08-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Regularization in a reproducing kernel Hilbert space for robust voice activity detection