Discrimination-Emphasized Mel-Frequency-Warping for Time-Varying Speaker Recognition
Performance degradation with time varying is a generally acknowledged phenomenon in speaker recognition and it is widely assumed that speaker models should be updated from time to time to maintain representativeness. However, it is costly, user-unfriendly, and sometimes, perhaps unrealistic, which hinders the technology from practical applications. From a pattern recognition point of view, the time-varying issue in speaker recognition requires such features that are speakerspecific, and as stable as possible across time-varying sessions. Therefore, after searching and analyzing the most stable parts of feature space, a Discrimination-emphasized Mel-frequencywarping method is proposed. In implementation, each frequency band is assigned with a discrimination score, which takes into account both speaker and session information, and Melfrequency- warping is done in feature extraction to emphasize bands with higher scores. Experimental results show that in the time-varying voiceprint database, this method can not only improve speaker recognition performance with an EER reduction of 19.1%, but also alleviate performance degradation brought by time varying with a reduction of 8.9%.
Linlin Wang Thomas Fang Zheng Chenhao Zhang Gang Wang
Center for Speech and Language Technologies, Division of Technical Innovation and Development, Tsing Center for Speech and Language Technologies, Division of Technical Innovation and Development,Tsingh
国际会议
2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)
西安
英文
1-4
2011-10-18(万方平台首次上网日期,不代表论文的发表时间)