A Scalable FPGA Accelerator for Convolutional Neural Networks
Convolution Neural Networks(CNN)have achieved undisputed success in many practical applications,such as image classification,face detection,and speech recognition.As we all know,FPGA-based CNN prediction is more efficient than GPU-based schemes,especially in terms of power consumption.In addition,OpenCL-based high-level synthesis tools in FPGA is widely utilized due to the fast verification and implementation flows.In this paper,we propose an FPGA accelerator with a scalable architecture of deeply pipelined OpenCL kernels.The design is verified by implementing three representative large-scale CNNs,AlexNet,VGG-16 and ResNet-50 on Altera OpenCL DE5-Net FPGA board.Our design has achieved a peak performance of 141 GOPS for convolution operation,and 103 GOPS for the entire VGG-16 network that performs ImageNet classification on DE5-Net board.
FPGA OpenCL Convolution Neural Networks Optimization
Ke Xu Xiaoyun Wang Shihang Fu Dong Wang
Institute of Information Science,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Laboratory of Advanced Information Science and Network Technology,Beijing 100044,China
国际会议
the 12th Conference on Advanced Computer Architecture?(ACA 2018)(2018年全国计算机体系结构学术年会)
辽宁营口
英文
3-14
2018-08-10(万方平台首次上网日期,不代表论文的发表时间)