Dynamic Delay Based Cyclic Gradient Update Method for Distributed Training
Distributed training performance is constrained by two factors.One is the communication overhead between parameter servers and workers.The other is the unbalanced computing powers across workers.We propose a dynamic delay based cyclic gradient update method,which allows workers to push gradients to parameter servers in a roundrobin order with dynamic delays.Stale gradient information is accumulated locally in each worker.When a worker obtains the token to update gradients,the accumulated gradients are pushed to parameter servers.Experiments show that,compared with the previous synchronous and cyclic gradient update methods,the dynamic delay cyclic method converges to the same accuracy at a faster speed.
Distributed training Deep learning Cyclic delayed method Stochastic optimization
Wenhui Hu Peng Wang Qigang Wang Zhengdong Zhou Hui Xiang Mei Li Zhongchao Shi
Artificial Intelligence Lab,Lenovo Research,Beijing 100085,China
国际会议
广州
英文
550-559
2018-11-23(万方平台首次上网日期,不代表论文的发表时间)