会议专题

Balanced Parallel FP-Growth with MapReduce

Frequent itemset mining (FIM) plays an essential role in mining associations, correlations and many other important data mining tasks. Unfortunately, as the volume of dataset gets larger day by day, most of the FIM algorithms in literature become ineffective due to either too huge resource requirement or too much communication cost. In this paper, we propose a balanced parallel FP-Growth algorithm BPFP, based on the PFP algorithm 1, which parallelizes FP-Growth in the MapReduce approach. BPFP adds into PFP load balance feature, which improves parallelization and thereby improves performance. Through empirical study, BPFP outperformed the PFP which uses some simple grouping strategy.

Algorithms Distributed computing

Le Zhou Zhiyong Zhong Jin Chang Junjie Li Joshua Zhexue Huang Shengzhong Feng

Center for High Performance Computing Institute of Advanced Computing and Digital Engineering Shenzhen Institutes of Advanced Technologies, Chinese Academy of Sciences

国际会议

2010 IEEE Youth Conference on Information,Computing and Telecommunications(2010 IEEE青年信息、计算和通信技术大会)

北京

英文

243-246

2010-11-28(万方平台首次上网日期,不代表论文的发表时间)