PERFORMANCE EVALUATION ON DATA RECONCILIATION ALGORITHM IN DISTRIBUTED SYSTEM
This research came from a school-enterprise cooperation program,which aims to improve data reconciliation efficiency between two large-scale data sources.This paper mainly presents three typical algorithms:standard Bloom filter (BF),counting Bloom filter (CBF) and Invertible Bloom filter (IBF).With the purpose of evaluating their performance,mainly on runtime and accuracy rate,a series of experiments were designed and applied to both a small-scale and a large-scale distributed system.These algorithms are compared based on one traditional query method Inner Join (IJ).And the result shows:under the MapReduce computing framework,Inner Join,followed by BF closely,has the best performance; large-scale distributed system can evidently improve the performance on dealing with large-scale data.
Data reconciliation Large-scale Bloom filter Hadoop
Xin Wang Hongming Zhu Qin Liu Xiaowen Yang Jiakai Xiao
School of Software and Engineering,Tongji University,Shanghai 200092,China
国际会议
杭州
英文
499-503
2012-10-30(万方平台首次上网日期,不代表论文的发表时间)