Parallel Implementation of Chi2 Algorithm in MapReduce Framework
The discretization of continuous attributes is an important preprocessing step for machine learning and data mining.How to efficiently process the discretization of continuous attributes of massive data has become an urgent problem to be resolved.Hadoop as a rising technique in recent years can efficiently process many applications based on massive data.This paper designs and implements a parallel Chi2-based discretization algorithm based on MapReduce model.On the premise of the discretization efficiency,experiments have been done by using different size of data sets in the different nodes.The experimental results show that the proposed algorithm has high efficiency and good scalability to process the discretization of continuous attributes of massive data.
Hadoop MapReduce Chi2 algorithm Large-scale data Discretization
Yong Zhang Jingwen Yu Jianying Wang
School of Computer and Information Technology,Liaoning Normal University,Dalian 116081,China School of Psychology,Liaoning Normal University,Dalian 116021,China
国际会议
南昌
英文
1-10
2013-09-26(万方平台首次上网日期,不代表论文的发表时间)