Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

摘要：

Challenge in GPU Computing Big gap between processing capacity and memoryaccess Computing is fast:each core(ALU) can finish one or two operations per cycle 1000 cores x 1GHz = 1TFlops But,one arithmetic operation requires two reads and one write.

作者: 褚晓文

作者单位: 香港浸会大学计算机科学系

会议类型: 国内会议

会议名称: 2017中国大数据技术大会

会议地点: 北京

会议语种:英文

页码: 1-42

在线出版日期: 2017-12-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs