Hub Selection for Hub Based Clustering Algorithms

摘要：

　　Hubs are the data instances appearing frequently on the nearest neighbours lists.As the hubs of a high-dimensional dataset are close to the centres of clusters or sub-clusters,some of them are selected as the centres of clusters by hub based clustering algorithms.In the process of hub selection,these algorithms rank data instances in terms of their global hubness scores computed upon their nearest neighbours lists,ignoring cluster related information such as their labels,their and their related instances clustering quality.As a result,some suitable hubs may be neglected.To solve this problem,we suggest evaluating instances by their relative hubness scores.Moreover,we propose a weighted relative hubness score computed upon nearest neighbours lists and silhouette information.Besides,we suggest selecting the instance of the highest silhouette information when two or more instances tie for first place.Experimental results on real datasets and synthetic datasets suggest that both the relative hubness score and the weighted relative hubness score can improve hub based clustering,and the weighted relative hubness score often plays better.

关键词： Clustering High-dimensional data Hubness Silhouette Information

作者: Zhenfeng He

作者单位: College of Mathematics and Computer Science Fuzhou University,Fuzhou,China

会议类型: 国际会议

会议名称: The 2014 10th International Conference on Natural Computation (ICNC 2014) and the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2014)(第十届自然计算和第十一届模糊系统与知识发现国际会议)

会议地点: 厦门

会议语种:英文

页码: 488-493

在线出版日期: 2014-08-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Hub Selection for Hub Based Clustering Algorithms