A Lowest Cost RDD Caching Strategy for Spark
Spark abstracts intermediate results into RDD in memory and manages them with LRU strategy to improve performance.However,RDD will be reloaded in many cases because RDD for different computing tasks have different lifecycle,which incurs additional system overhead.In this paper we proposed a lowest cost replacement strategy as Sparks cache replacement strategy to eliminate this problem.This strategy preemptively evicts RDD with small weight values from memory based on the weight model.And then,in this process,we select the solution with the lowest cost to replace the RDD in memory to improve the efficiency of Spark.Finally,experiment results show that strategy we proposed can speed up the efficiency of the whole cluster.
RDD Spark memory management Memory computing Cache strategy
Yuyang Wang Tianlei Zhou
School of computer science and technology,Chongqing University of Posts and Telecommunications,Chongqing,400065,China;Chongqing Engineering Research Center of Mobile Internet Data Application,Chongqing,400065,China
国际会议
重庆
英文
30-36
2019-05-30(万方平台首次上网日期,不代表论文的发表时间)