会议专题

Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks

Speaker clustering is an important step in multi-speaker detection tasks and its performance directly affects the speaker detection performance. It is observed that the shorter the average length of single-speaker speech segments after segmentation is, the worse performance of the following speaker recognition will be achieved, therefore a reasonable solution to better multi-speaker detection performance is to enlarge the average length of after-segmentation single-speaker speech segments, which is equivalently to cluster as many true same-speaker segments into one as possible. In other words, the average class purity of each speaker segment should be as bigger as possible. Accordingly, a speaker-clustering algorithm based on the class purity criterion is proposed, where a Reference Speaker Model (RSM) scheme is adopted to calculate the distance between speech segments, and the maximal class purity, or equivalently the minimal within-class dispersion, is taken as the criterion. Experiments on the NIST SRE 2006 database showed that, compared with the conventional Hierarchical Agglomerative Clustering (HAC) algorithm, for speech segments with average lengths of 2 seconds, 5 seconds and 8 seconds, the proposed algorithm increased the valid class speech length by 2.7%, 3.8% and 4.6%, respectively, and finally the target speaker detection recall rate was increased by 7.6%, 6.2% and 5.1%, respectively.

Gang Wang Xiaojun Wu Thomas Fang Zheng Linlin Wang Chenhao Zhang

Center for Speech and Language Technologies, Division of Technical Innovation and Development,Tsingh Center for Speech and Language Technologies, Division of Technical Innovation and Development, Tsing

国际会议

2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

西安

英文

1-4

2011-10-18(万方平台首次上网日期,不代表论文的发表时间)