会议专题

SLDP:a Novel Data Placement Strategy for Large-Scale Heterogeneous Hadoop Cluster

  Hadoop as a popular open-source implementation of MapReduce is widely used for large scale data-intensive applications like data mining,web indexing and scientific computing.The current Hadoop implementation assumes that nodes in a cluster are homogeneous in nature,and Hadoop distributed file system(HDFS)distributes data to multiple nodes based on disk space availability.Such data placement strategy is very efficient for homogeneous environments,where nodes are identical in terms of both computing power and disk capacity.Unfortunately,in practice,the homogeneity assumptions do not always hold.Hadoops scheduler will lead to severe performance degradation and energy dissipation in heterogeneous environments by using default data placement strategy of HDFS.In this paper,we propose a novel snakelike data placement mechanism(SLDP)for large-scale heterogeneous Hadoop cluster.SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers(VST)firstly,and then places data blocks across nodes in each VST circuitously according to the hotness of data.Furthermore,SLDP uses a hotness proportional replication to reduce disk space consumption and also has an effective power control function.Experimental results on two real data-intensive applications show that SLDP is energy-efficient,space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.

Runqun Xiong Junzhou Luo Fang Dong

School of Computer Science and Engineering,Southeast University,Nanjing,P.R.China

国际会议

2014 2nd International Conference on Advanced Cloud and Big Data (CBD 2014)(2014年先进云计算和大数据国际会议)

安徽黄山

英文

9-17

2014-11-20(万方平台首次上网日期,不代表论文的发表时间)