Optimization and Analysis of Parallel Back Propagation Neural Network On GPU Using CUDA
Back Propagation Network (BPN) is widely used in machine learning but very time consuming. Inherent parallelism has been researched for building parallel BPN, some methods achieve considerable performance. Nowadays, GPU attracts the attention of many computationally intensive applications with abundant data-parallel cores and multi-level memory hierarchy. GPU can achieve remarkable performance for datasetoriented application such as BPN under reasonable task decomposition and memory optimization. Advantages of GPU are still not fully exploited to parallel BPN in previous methods. This paper exploits parallelism of BPN on GPU by CUDA. In our method, parallel data structures are designed, which can be naturally mapped onto on-chip shared memory and accessed without bank conflict Moreover, we express BP algorithm by GPU-adapting matrix-vector, vectormatrix multiplication and matrix assignment operations. Intermediate vectors are stored in shared memory in order to avoid swap vector in and out frequently. When running BP algorithm on several well-known benchmarks, the experimental results show 1.29x to 64.32x speedup compared with a CPU version on our platform. We point out the results are influenced by both hidden neurons and network density.
Back Propagation GPU producer-consumer locality bank conflict
Min Xu Hong An Wei Zhou Junrui Zhou Yaobin Wang
Department of Computer Science and Technology University of Science and Technology of China Hefei,China
国际会议
太原
英文
351-355
2011-02-26(万方平台首次上网日期,不代表论文的发表时间)