MR-Runner:A Modularized Map-Reduce Job Management Tool
Map-Reduce is a powerful solution for processing and analyzing large-scale data.Just as Hadoop and Spark are able to deal with terabyte data and even more.Users only need to complete “map and “reduce function,the Map-Reduce framework can finish variety jobs.But many machine learning and data mining algorithms cannot leverage the Map- Reduce framework or it would take large efforts to modify the algorithm itself.This issue can be explained by the following ways: 1.Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration.2.Map-Reduce is absolutely parallel,each vertex cannot obtain all records,so none of them could get the global optimal model.In this paper,we proposed a job management tool to enable the Map-Reduce framework to support iteration,called “de-parallel.This make the Map- Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks.In addition,our tool does not modify the Map-Reduce framework itself.In face MR-Runner interacts with Map-Reduce framework like a “client,therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster.We also abstract the mainly interface related to Map-Reduce frameworks,this makes our tool portable to the representative Map-Reduce frameworks.
Map-Reduce Job management Iteration Modularization
Xinsheng Yang Wei Wang Lijie Xu Jie liu Jun Wei
Technology Center of Software Engineering,Institute of Software Chinese Academy of Sciences Beijing 100190,P.R. China
国际会议
长沙
英文
207-210
2013-10-23(万方平台首次上网日期,不代表论文的发表时间)