会议专题

A Lowest Cost RDD Caching Strategy for Spark

  Spark abstracts intermediate results into RDD in memory and manages them with LRU strategy to improve performance.However,RDD will be reloaded in many cases because RDD for different computing tasks have different lifecycle,which incurs additional system overhead.In this paper we proposed a lowest cost replacement strategy as Sparks cache replacement strategy to eliminate this problem.This strategy preemptively evicts RDD with small weight values from memory based on the weight model.And then,in this process,we select the solution with the lowest cost to replace the RDD in memory to improve the efficiency of Spark.Finally,experiment results show that strategy we proposed can speed up the efficiency of the whole cluster.

RDD Spark memory management Memory computing Cache strategy

Yuyang Wang Tianlei Zhou

School of computer science and technology,Chongqing University of Posts and Telecommunications,Chongqing,400065,China;Chongqing Engineering Research Center of Mobile Internet Data Application,Chongqing,400065,China

国际会议

2019 4th International Conference on Automatic Control and Mechatronic Engineering (ACME 2019) 2019年第四届自动控制与机电工程国际会议(ACME 2019)

重庆

英文

30-36

2019-05-30(万方平台首次上网日期,不代表论文的发表时间)