Discrimination-Emphasized Mel-Frequency-Warping for Time-Varying Speaker Recognition

摘要：

Performance degradation with time varying is a generally acknowledged phenomenon in speaker recognition and it is widely assumed that speaker models should be updated from time to time to maintain representativeness. However, it is costly, user-unfriendly, and sometimes, perhaps unrealistic, which hinders the technology from practical applications. From a pattern recognition point of view, the time-varying issue in speaker recognition requires such features that are speakerspecific, and as stable as possible across time-varying sessions. Therefore, after searching and analyzing the most stable parts of feature space, a Discrimination-emphasized Mel-frequencywarping method is proposed. In implementation, each frequency band is assigned with a discrimination score, which takes into account both speaker and session information, and Melfrequency- warping is done in feature extraction to emphasize bands with higher scores. Experimental results show that in the time-varying voiceprint database, this method can not only improve speaker recognition performance with an EER reduction of 19.1％, but also alleviate performance degradation brought by time varying with a reduction of 8.9％.

作者: Linlin Wang Thomas Fang Zheng Chenhao Zhang Gang Wang

作者单位: Center for Speech and Language Technologies, Division of Technical Innovation and Development, Tsing Center for Speech and Language Technologies, Division of Technical Innovation and Development,Tsingh

会议类型: 国际会议

会议名称: 2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

会议地点: 西安

会议语种:英文

页码: 1-4

在线出版日期: 2011-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Discrimination-Emphasized Mel-Frequency-Warping for Time-Varying Speaker Recognition