Reducing Communication Overhead in the High Performance Conjugate Gradient Benchmark on Tianhe-2

摘要：

　　The High Performance Conjugate Gradient (HPCG) benchmark,proposed recently in 2013,has drawn increasingly more attention from both academia and industry.Unlike the High Performance Linpack (HPL) benchmark,which has a very high computation-to-communication ratio,HPCG contains both neighboring and global communication that may severely degrade the parallel performance.To reduce the communication overhead of neighboring communications,we overlap halo updates with halo-independent computations.To hide the cost of the global reductions in vector dot-products,we make use of two reformulated CG algorithms,namely the Gropp’s asynchronous CG and the pipelined CG.Some further optimizations are done to decrease the extra overhead introduced in the reformulated CG algorithms.We show by experiments on the world’s largest heterogeneous system – Tianhe-2 that the optimized HPCG code scales to 256 nodes (49,920 cores) with a nearly ideal weak scalability of over 90%and an aggregate performance of 10.51Tflops.

关键词： HPCG communication-computation overlap pipelined CG asynchronous CG Tianhe-2

作者: Fangfang Liu Chao Yang Yiqun Liu Xianyi Zhang Yutong Lu

作者单位: Institute of Software,Chinese Academy of Sciences,Beijing 100190,China Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;State Key Laboratory of Compu Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy Dept.of Computer Science & Technology,National University of Defense Technology,Changsha,Hunan 41007

会议类型: 国际会议

会议名称: The 13th International Symposium on Distributed Computing and Applications to Business,Engineering and Science(DCABES 2014)(第十三届分布式计算及其应用国际学术研讨会)

会议地点: 湖北咸宁

会议语种:英文

页码: 13-18

在线出版日期: 2014-11-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Reducing Communication Overhead in the High Performance Conjugate Gradient Benchmark on Tianhe-2