K-means Clustering Algorithm for Large-scale Chinese Commodity Information Web Based on Hadoop

摘要：

　　With the growing popularity of the network,product information filled in the many pages of the Internet,which you want to get the information you need on these pages tend to consider clustering information,and the current explosive growth of data so that the information mass storage condition occurs,clustering to facing the problems such as large calculation complexity and time consuming,then the traditional K-Means clustering algorithm does not meet the needs of large data environments today,so this article combined with the advantages of the Hadoop platform and MapReduce programming model is proposed the K-Means clustering algorithm for large-scale chinese commodity information Web based on Hadoop.Map function calculates the distance from the cluster center for each sample and mark to their category,Reduce function intermediate results are summarized and calculated new clustering center for the next round of iteration.Experimental results show that this method can better improve the clustering processing speed.

关键词： K-Means clustering algorithm Hadoop platform MapReduce Cloud computing Big Data

作者: Geng Yushui Zhang Lishuo

作者单位: School of Information Qilu University of Technology Jinan250353,China

会议类型: 国际会议

会议名称: The 14th International Symposium on Distributed Computing and Applications to Business,Engineering and Science(DCABES 2015)(第十四届分布式计算及其应用国际学术研讨会)

会议地点: 贵阳

会议语种:英文

页码: 256-259

在线出版日期: 2015-08-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

K-means Clustering Algorithm for Large-scale Chinese Commodity Information Web Based on Hadoop