Utilizing Auditory Masking in Automatic Speech Recognition
A speech recognition system based on the psychoacoustics of the masking property of the of human auditory system is proposed. The method utilizes several psychoacoustic properties of human perception to define perceptual speech excitation function (masking threshold) and perceptual noise. Based on the auditory masking threshold, a time-frequency noise spectral subtraction is implemented. For a human listener, the noise below the masking threshold is inaudible, and the objective is to minimize only the noise spectrum above the masking threshold. Additionally, we show that, for ASR applications, further improvements in recognition performance may be obtained by augmenting the masking of the noise by spectral subtraction in the masked region also. The strategy is to remove the masked noise from the ASR system, similar to the masking effect in the human auditory system. Based on the AMT, and the estimated perceptual noise, we have implemented two spectral subtraction algorithms: a straight-forward scheme of subtracting the total estimated perceptual noise from the noisy speech spectrum, and a spectral subtraction of the noise which lies below the masking threshold. It was observed that, both methods give significant improvements over the base PLP performance, with the latter method giving better recognition results.
Serajul Haque
国际会议
上海
英文
1758-1764
2010-11-23(万方平台首次上网日期,不代表论文的发表时间)