会议专题

Utilizing Auditory Masking in Automatic Speech Recognition

A speech recognition system based on the psychoacoustics of the masking property of the of human auditory system is proposed. The method utilizes several psychoacoustic properties of human perception to define perceptual speech excitation function (masking threshold) and perceptual noise. Based on the auditory masking threshold, a time-frequency noise spectral subtraction is implemented. For a human listener, the noise below the masking threshold is inaudible, and the objective is to minimize only the noise spectrum above the masking threshold. Additionally, we show that, for ASR applications, further improvements in recognition performance may be obtained by augmenting the masking of the noise by spectral subtraction in the masked region also. The strategy is to remove the masked noise from the ASR system, similar to the masking effect in the human auditory system. Based on the AMT, and the estimated perceptual noise, we have implemented two spectral subtraction algorithms: a straight-forward scheme of subtracting the total estimated perceptual noise from the noisy speech spectrum, and a spectral subtraction of the noise which lies below the masking threshold. It was observed that, both methods give significant improvements over the base PLP performance, with the latter method giving better recognition results.

Serajul Haque

国际会议

2010 International Conference on Audio,Language and Image Processing(2010年音频、语言与图像处理国际会议 ICALIP 2010)

上海

英文

1758-1764

2010-11-23(万方平台首次上网日期,不代表论文的发表时间)