MR-Runner:A Modularized Map-Reduce Job Management Tool

摘要：

　　Map-Reduce is a powerful solution for processing and analyzing large-scale data.Just as Hadoop and Spark are able to deal with terabyte data and even more.Users only need to complete “map and “reduce function,the Map-Reduce framework can finish variety jobs.But many machine learning and data mining algorithms cannot leverage the Map- Reduce framework or it would take large efforts to modify the algorithm itself.This issue can be explained by the following ways: 1.Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration.2.Map-Reduce is absolutely parallel,each vertex cannot obtain all records,so none of them could get the global optimal model.In this paper,we proposed a job management tool to enable the Map-Reduce framework to support iteration,called “de-parallel.This make the Map- Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks.In addition,our tool does not modify the Map-Reduce framework itself.In face MR-Runner interacts with Map-Reduce framework like a “client,therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster.We also abstract the mainly interface related to Map-Reduce frameworks,this makes our tool portable to the representative Map-Reduce frameworks.

关键词： Map-Reduce Job management Iteration Modularization

作者: Xinsheng Yang Wei Wang Lijie Xu Jie liu Jun Wei

作者单位: Technology Center of Software Engineering,Institute of Software Chinese Academy of Sciences Beijing 100190,P.R. China

会议类型: 国际会议

会议名称: 第五届亚太网构软件研讨会

会议地点: 长沙

会议语种:英文

页码: 207-210

在线出版日期: 2013-10-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

MR-Runner:A Modularized Map-Reduce Job Management Tool