会议专题

Mutiple-Job Optimization in MapReduce for Heterogeneous Workloads

MapReduce cluster is emerging as a solution of dataintensive scalable computing system. The open source implementation Hadoop has already been adopted for building clusters containing thousands of nodes. Such cloud infrastructure was used to processing many different jobs depending on different hardware resources, such as memory, CPU, Disk I/O and Network I/O, simultaneously. If the schedule policy does not consider the heterogeneity of running jobs’ resource utilization types, resource contention may happen. In this paper, we analyze this multiple job parallelization problems in MapReduce, and propose the multiple-job optimization (MJO) scheduler. Our scheduler detects job’s resource utilization type on the fly and improves the hardware utilization by parallel different kinds of jobs. We give two scenarios which are “same plan and “same job to illustrate the multiple jobs’ submission traces in MapReduce clusters. Our experiments show that in these scenarios, MJO scheduler could save the makespan by about 20%.

component MapReduce Schdule heterogeneous workloads Mutiple job optimization

Weisong Hu Chao Tian Xiaowei Liu Hongwei Qi Li Zha Huaming Liao Yuezhuo Zhang Jie Zhang

NEC Labs China, Beijing, China Institute of Computing TechnologyChinese Academy of Sciences, Beijing, China Graduate University of Institute of Computing TechnologyChinese Academy of Sciences, Beijing, China

国际会议

Sixth International Conference on Semantics,Knowledge and Grids(第六届语义、知识与网格国际会议 SKG 2010)

宁波

英文

135-140

2010-11-01(万方平台首次上网日期,不代表论文的发表时间)