A Throughput-Aware Analytical Performance Model for GPU Applications

摘要：

　　Graphics processing units (GPUs) have shown increased popularity in general-purpose parallel processing.This massively parallel architecture allows GPUs to execute tens of thousands of threads in parallel to solve heavily data-parallel problems efficiently.However,despite the tremendous computing power,optimizing GPU kernels to achieve high performance is still a challenge due to the sea change from CPU to GPU and lacking of tools for programming and performance analysis.In this paper,we propose a throughput-aware analytical model to estimate the performance of GPU kernels and optimizations.We construct a pipeline for global memory access servicing and redefine the compute throughput and memory throughput as the speed of memory requests arriving and leaving the pipeline.Based on concluding the kernel throughput limiting factor,GPU programs are classified into compute-bound and memory-bound categories and then we predict performance for each category.Besides,our model can provide useful information on the direction of optimization and predict the potential performance benefits.We demonstrate our model on a manually written benchmark as well as the matrix-multiply kernel and show that the geometric mean of absolute error of our model is less than 6.5%.

关键词： GPU compute-bound memory-bound performance prediction performance bottleneck

作者: Zhidan Hu Guangming Liu Wenrui Dong

作者单位: College of Computer,National University of Defense Technology Hunan,China

会议类型: 国际会议

会议名称: ACA,Advanced Computer Architecture(2014年全国计算机体系结构学术会议)

会议地点: 沈阳

会议语种:英文

页码: 98-112

在线出版日期: 2014-08-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Throughput-Aware Analytical Performance Model for GPU Applications