Dynamic Delay Based Cyclic Gradient Update Method for Distributed Training

摘要：

　　Distributed training performance is constrained by two factors.One is the communication overhead between parameter servers and workers.The other is the unbalanced computing powers across workers.We propose a dynamic delay based cyclic gradient update method,which allows workers to push gradients to parameter servers in a roundrobin order with dynamic delays.Stale gradient information is accumulated locally in each worker.When a worker obtains the token to update gradients,the accumulated gradients are pushed to parameter servers.Experiments show that,compared with the previous synchronous and cyclic gradient update methods,the dynamic delay cyclic method converges to the same accuracy at a faster speed.

关键词： Distributed training Deep learning Cyclic delayed method Stochastic optimization

作者: Wenhui Hu Peng Wang Qigang Wang Zhengdong Zhou Hui Xiang Mei Li Zhongchao Shi

作者单位: Artificial Intelligence Lab,Lenovo Research,Beijing 100085,China

会议类型: 国际会议

会议名称: 中国模式识别与计算机视觉大会(PRCV2018)

会议地点: 广州

会议语种:英文

页码: 550-559

在线出版日期: 2018-11-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Dynamic Delay Based Cyclic Gradient Update Method for Distributed Training