Balanced Parallel FP-Growth with MapReduce
Frequent itemset mining (FIM) plays an essential role in mining associations, correlations and many other important data mining tasks. Unfortunately, as the volume of dataset gets larger day by day, most of the FIM algorithms in literature become ineffective due to either too huge resource requirement or too much communication cost. In this paper, we propose a balanced parallel FP-Growth algorithm BPFP, based on the PFP algorithm 1, which parallelizes FP-Growth in the MapReduce approach. BPFP adds into PFP load balance feature, which improves parallelization and thereby improves performance. Through empirical study, BPFP outperformed the PFP which uses some simple grouping strategy.
Algorithms Distributed computing
Le Zhou Zhiyong Zhong Jin Chang Junjie Li Joshua Zhexue Huang Shengzhong Feng
Center for High Performance Computing Institute of Advanced Computing and Digital Engineering Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences
国际会议
2010 IEEE Youth Conference on Information,Computing and Telecommunications(2010 IEEE青年信息、计算和通信技术大会)
北京
英文
243-246
2010-11-28(万方平台首次上网日期,不代表论文的发表时间)