会议专题

Hub Selection for Hub Based Clustering Algorithms

  Hubs are the data instances appearing frequently on the nearest neighbours lists.As the hubs of a high-dimensional dataset are close to the centres of clusters or sub-clusters,some of them are selected as the centres of clusters by hub based clustering algorithms.In the process of hub selection,these algorithms rank data instances in terms of their global hubness scores computed upon their nearest neighbours lists,ignoring cluster related information such as their labels,their and their related instances clustering quality.As a result,some suitable hubs may be neglected.To solve this problem,we suggest evaluating instances by their relative hubness scores.Moreover,we propose a weighted relative hubness score computed upon nearest neighbours lists and silhouette information.Besides,we suggest selecting the instance of the highest silhouette information when two or more instances tie for first place.Experimental results on real datasets and synthetic datasets suggest that both the relative hubness score and the weighted relative hubness score can improve hub based clustering,and the weighted relative hubness score often plays better.

Clustering High-dimensional data Hubness Silhouette Information

Zhenfeng He

College of Mathematics and Computer Science Fuzhou University,Fuzhou,China

国际会议

The 2014 10th International Conference on Natural Computation (ICNC 2014) and the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2014)(第十届自然计算和第十一届模糊系统与知识发现国际会议)

厦门

英文

488-493

2014-08-19(万方平台首次上网日期,不代表论文的发表时间)