会议专题

PERFORMANCE EVALUATION ON DATA RECONCILIATION ALGORITHM IN DISTRIBUTED SYSTEM

  This research came from a school-enterprise cooperation program,which aims to improve data reconciliation efficiency between two large-scale data sources.This paper mainly presents three typical algorithms:standard Bloom filter (BF),counting Bloom filter (CBF) and Invertible Bloom filter (IBF).With the purpose of evaluating their performance,mainly on runtime and accuracy rate,a series of experiments were designed and applied to both a small-scale and a large-scale distributed system.These algorithms are compared based on one traditional query method Inner Join (IJ).And the result shows:under the MapReduce computing framework,Inner Join,followed by BF closely,has the best performance; large-scale distributed system can evidently improve the performance on dealing with large-scale data.

Data reconciliation Large-scale Bloom filter Hadoop

Xin Wang Hongming Zhu Qin Liu Xiaowen Yang Jiakai Xiao

School of Software and Engineering,Tongji University,Shanghai 200092,China

国际会议

2012 2nd IEEE International Conference on Cloud Computing and Intelligence Systems (2012年第2届IEEE云计算与智能系统国际会议(IEEE CCIS2012))

杭州

英文

499-503

2012-10-30(万方平台首次上网日期,不代表论文的发表时间)