会议专题

Ef.cient Star Join for Column-oriented Data Store in the MapReduce Environment

MapReduce is a parallel computing paradigm that has gained a lot of attention from both industry and academia recent years. Unlike parallel DBMSs, with MapReduce, it is easier for non-expert to develop scalable parallel programs for analytical applications over huge data sets across clusters of commodity machines. As the nature of scan-oriented processing, the performance of MapReduce for relation operators can be enhanced dramatically since it is inevitably accessing lots of unnecessary data tuples, especially for table join operators. In this paper, we propose an ef.cient star join strategy called HdBmp join for column-oriented data store by using a three-level content aware index (I.e., HdBmp Index). Armed with this index, most of the unnecessary tuples in the join processing can be .ltered out, and consequently result in immense reduction in both communication cost and execution time. Our extensive experimental studies con.rm the ef.ciency, scalability and effectiveness of our new proposed join methods.

star join column store HdBmp index HdBmp join

Haitong Zhu Minqi Zhou Fan Xia Aoying Zhou

Institute of Massive Computing East China Normal University. Shanghai 200062, China

国际会议

第8届全国web信息系统及应用学术会议

重庆

英文

13-18

2011-10-21(万方平台首次上网日期,不代表论文的发表时间)