Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Challenge in GPU Computing Big gap between processing capacity and memoryaccess Computing is fast:each core(ALU) can finish one or two operations per cycle 1000 cores x 1GHz = 1TFlops But,one arithmetic operation requires two reads and one write.
褚晓文
香港浸会大学计算机科学系
国内会议
北京
英文
1-42
2017-12-01(万方平台首次上网日期,不代表论文的发表时间)