Research on Distributed Data Skew Join Algorithm Based on VGFR Model
The join operation is one of the core content of complex analysis tasks dealing with massive data.How to dealing with the data skew occurred in the join process effectively is a big challenge to us.But join algorithm based on the fragment and replicate in the current Shared-Nothing distributed architecture,which on the one hand,introduced a large data transmission overhead when a large amount of data is used to copy,the other hand,can not be adjusted dynamically according to the real-time load.In this paper,we propose and implement a grouping fragment and replicate join strategy based on virtual node (VGFR),aimed at robustness in terms of the size of both join sides,at the same time,the mapping relationship between virtual node and actual node is dynamically determined according to the systems real-time load,which can not only solve the problem of data skew effectively but also guarantee load balancing capability of the fine grained.Experimental results demonstrate that the optimization technique can improve the performance of join operation effectively and has the adaptability.
query optimization distributed system join query data skew load balancing
Ding Xiang-wu Hu Rui
College of Computer Science and Technology Donghua University Shanghai,China
国际会议
重庆
英文
883-887
2016-03-20(万方平台首次上网日期,不代表论文的发表时间)