会议专题

MR-Runner:A Modularized Map-Reduce Job Management Tool

  Map-Reduce is a powerful solution for processing and analyzing large-scale data.Just as Hadoop and Spark are able to deal with terabyte data and even more.Users only need to complete “map and “reduce function,the Map-Reduce framework can finish variety jobs.But many machine learning and data mining algorithms cannot leverage the Map- Reduce framework or it would take large efforts to modify the algorithm itself.This issue can be explained by the following ways: 1.Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration.2.Map-Reduce is absolutely parallel,each vertex cannot obtain all records,so none of them could get the global optimal model.In this paper,we proposed a job management tool to enable the Map-Reduce framework to support iteration,called “de-parallel.This make the Map- Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks.In addition,our tool does not modify the Map-Reduce framework itself.In face MR-Runner interacts with Map-Reduce framework like a “client,therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster.We also abstract the mainly interface related to Map-Reduce frameworks,this makes our tool portable to the representative Map-Reduce frameworks.

Map-Reduce Job management Iteration Modularization

Xinsheng Yang Wei Wang Lijie Xu Jie liu Jun Wei

Technology Center of Software Engineering,Institute of Software Chinese Academy of Sciences Beijing 100190,P.R. China

国际会议

第五届亚太网构软件研讨会

长沙

英文

207-210

2013-10-23(万方平台首次上网日期,不代表论文的发表时间)