Parallel Implementation of Chi2 Algorithm in MapReduce Framework

摘要：

　　The discretization of continuous attributes is an important preprocessing step for machine learning and data mining.How to efficiently process the discretization of continuous attributes of massive data has become an urgent problem to be resolved.Hadoop as a rising technique in recent years can efficiently process many applications based on massive data.This paper designs and implements a parallel Chi2-based discretization algorithm based on MapReduce model.On the premise of the discretization efficiency,experiments have been done by using different size of data sets in the different nodes.The experimental results show that the proposed algorithm has high efficiency and good scalability to process the discretization of continuous attributes of massive data.

关键词： Hadoop MapReduce Chi2 algorithm Large-scale data Discretization

作者: Yong Zhang Jingwen Yu Jianying Wang

作者单位: School of Computer and Information Technology,Liaoning Normal University,Dalian 116081,China School of Psychology,Liaoning Normal University,Dalian 116021,China

会议类型: 国际会议

会议名称: The 9th International Conference on Pervasive Computing and Application(ICPCA 2014)(第九届全国普适计算学术会议、第九届全国人机交互联合学术会议)

会议地点: 南昌

会议语种:英文

页码: 1-10

在线出版日期: 2013-09-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Parallel Implementation of Chi2 Algorithm in MapReduce Framework