会议专题

Optimization and Analysis of Parallel Back Propagation Neural Network On GPU Using CUDA

Back Propagation Network (BPN) is widely used in machine learning but very time consuming. Inherent parallelism has been researched for building parallel BPN, some methods achieve considerable performance. Nowadays, GPU attracts the attention of many computationally intensive applications with abundant data-parallel cores and multi-level memory hierarchy. GPU can achieve remarkable performance for datasetoriented application such as BPN under reasonable task decomposition and memory optimization. Advantages of GPU are still not fully exploited to parallel BPN in previous methods. This paper exploits parallelism of BPN on GPU by CUDA. In our method, parallel data structures are designed, which can be naturally mapped onto on-chip shared memory and accessed without bank conflict Moreover, we express BP algorithm by GPU-adapting matrix-vector, vectormatrix multiplication and matrix assignment operations. Intermediate vectors are stored in shared memory in order to avoid swap vector in and out frequently. When running BP algorithm on several well-known benchmarks, the experimental results show 1.29x to 64.32x speedup compared with a CPU version on our platform. We point out the results are influenced by both hidden neurons and network density.

Back Propagation GPU producer-consumer locality bank conflict

Min Xu Hong An Wei Zhou Junrui Zhou Yaobin Wang

Department of Computer Science and Technology University of Science and Technology of China Hefei,China

国际会议

2011 3rd International Conference on Computer and Network Technology(ICCNT 2011)(2011第三届IEEE计算机与网络技术国际会议)

太原

英文

351-355

2011-02-26(万方平台首次上网日期,不代表论文的发表时间)