Hub Selection for Hub Based Clustering Algorithms
Hubs are the data instances appearing frequently on the nearest neighbours lists.As the hubs of a high-dimensional dataset are close to the centres of clusters or sub-clusters,some of them are selected as the centres of clusters by hub based clustering algorithms.In the process of hub selection,these algorithms rank data instances in terms of their global hubness scores computed upon their nearest neighbours lists,ignoring cluster related information such as their labels,their and their related instances clustering quality.As a result,some suitable hubs may be neglected.To solve this problem,we suggest evaluating instances by their relative hubness scores.Moreover,we propose a weighted relative hubness score computed upon nearest neighbours lists and silhouette information.Besides,we suggest selecting the instance of the highest silhouette information when two or more instances tie for first place.Experimental results on real datasets and synthetic datasets suggest that both the relative hubness score and the weighted relative hubness score can improve hub based clustering,and the weighted relative hubness score often plays better.
Clustering High-dimensional data Hubness Silhouette Information
Zhenfeng He
College of Mathematics and Computer Science Fuzhou University,Fuzhou,China
国际会议
厦门
英文
488-493
2014-08-19(万方平台首次上网日期,不代表论文的发表时间)