会议专题

A Novel Processing Model For Scds In ETL

  ETL(Extract-Transform-Load)which populates data from various data source systems to data warehouses(DWs)is an important part of building data warehouse.Nowadays,as the data growing rapidly,it is a big challenge for ETL to process such huge data quickly.MapReduce is a programming model for large-scale data-intensive processing.It is composed of two functions,map and reduce,this promotes the implementation of many tasks in parallel.However,this model has its disadvantages.For example,it is not so efficiency when the mappers produce lots of data,which will take a lot of network cost to move the Intermediate data to reducers.In this paper,we present a new method called map-only.With this method,we do the reduce in the local and do not need to transfer the data to the reducers through the network.The result shows that the method we present performs very well,which improves the speed of processing data for both Type-1 and Type-2 SCDs.For example,when the size of increasing data is 5GB,with the map-only method,it takes only 20 minutes to process the Type-2 SCDs while it costs 28 minutes to process the same data.

ETL MapReduce map-only

Li Sun Jiaoyan Zhang Jiyun Li

School of computer science and technology,Donghua University,shanghai,china

国际会议

2017年第2届联合国际信息技术、机械与电子工程国际会议(JIMEC2017)

重庆

英文

133-136

2017-10-04(万方平台首次上网日期,不代表论文的发表时间)