Regularization in a reproducing kernel Hilbert space for robust voice activity detection
Voice activity detection (VAD) is still a difficult task in noisy environments since the statistical distributions of speech and non-speech features are heavily overlapped in noisy environments. Considering that speech is a special type of acoustic signal that only occupies a small fraction of the whole acoustic space, we have proposed a new speech processing method for VAD by giving constraints on the processing space as a reproducing kernel Hilbert space (RKHS) 1. In the RKHS, the estimation of the speech was regarded as a functional approximation problem. Via a regularization in the RKHS framework, a target function is learned to approximate the speech signal while the noise component is supposed to be smoothed out. In this framework, we could incorporate the nonlinear mapping functions in the approximation implicitly via a kernel function. The approximation function could capture the nonlinear and high-order statistical structure of the speech. Our VAD algorithm is designed on the basis of the power energy in this regularized RKHS. We have tested its performance on CENSREC-1-C data corpus for VAD task 1. In this paper, we quantified its performance on the discriminability for speech and non-speech, and further compared its performance with several classical VAD algorithms. Experimental results showed that the proposed processing for speech enhanced the discriminability between the distributions of speech and non-speech, and got better performance on the VAD task than the classical VAD algorithms.
Xugang Lu Masashi Unoki Ryosuke Isotani Hisashi Kawai Satoshi Nakamura
National Institute of Information and Communications Technology, Japan Japan Advanced Institute of Science and Technology, Japan
国际会议
2010 IEEE 10th International Conference on Signal Processing(第十届信号处理国际会议 ICSP 2010)
北京
英文
585-588
2010-08-24(万方平台首次上网日期,不代表论文的发表时间)