An improved K-Means clustering algorithm

摘要：

The K-Means clustering algorithm is proposed by Mac Queen in 1967 which is a partition-based cluster analysis method. It is used widely in cluster analysis for that the K-means algorithm has higher efficiency and scalability and converges fast when dealing with large data sets. However it also has many deficiencies: the number of clusters K needs to be initialized, the initial cluster centers are arbitrarily selected, and the algorithm is influenced by the noise points. In view of the shortcomings of the traditional K-Means clustering algorithm, this paper presents an improved K-means algorithm using noise data filter. The algorithm developed densitybased detection methods based on characteristics of noise data where the discovery and processing steps of the noise data are added to the original algorithm. By preprocessing the data to exclude these noise data before clustering data set the cluster cohesion of the clustering results is improved significantly and the impact of noise data on K-means algorithm is decreased effectively and the clustering results are more accurate.

关键词： cluster K-Means outlier

作者: Juntao Wang Xiaolong Su

作者单位: School of Computer Science and Technology China University of Mining & Technology Xuzhou, China

会议类型: 国际会议

会议名称: 2011 2nd International Conference on Data Storage and Data Engineering(DSDE 2011)(2011年第二届数据存储与数据工程国际会议)

会议地点: 西安

会议语种:英文

页码: 44-46

在线出版日期: 2011-05-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An improved K-Means clustering algorithm