会议专题

A Throughput-Aware Analytical Performance Model for GPU Applications

  Graphics processing units (GPUs) have shown increased popularity in general-purpose parallel processing.This massively parallel architecture allows GPUs to execute tens of thousands of threads in parallel to solve heavily data-parallel problems efficiently.However,despite the tremendous computing power,optimizing GPU kernels to achieve high performance is still a challenge due to the sea change from CPU to GPU and lacking of tools for programming and performance analysis.In this paper,we propose a throughput-aware analytical model to estimate the performance of GPU kernels and optimizations.We construct a pipeline for global memory access servicing and redefine the compute throughput and memory throughput as the speed of memory requests arriving and leaving the pipeline.Based on concluding the kernel throughput limiting factor,GPU programs are classified into compute-bound and memory-bound categories and then we predict performance for each category.Besides,our model can provide useful information on the direction of optimization and predict the potential performance benefits.We demonstrate our model on a manually written benchmark as well as the matrix-multiply kernel and show that the geometric mean of absolute error of our model is less than 6.5%.

GPU compute-bound memory-bound performance prediction performance bottleneck

Zhidan Hu Guangming Liu Wenrui Dong

College of Computer,National University of Defense Technology Hunan,China

国际会议

ACA,Advanced Computer Architecture(2014年全国计算机体系结构学术会议)

沈阳

英文

98-112

2014-08-23(万方平台首次上网日期,不代表论文的发表时间)