Parallel Randomized Block Coordinate Descent for Neural Probabilistic Language Model with High-Dimensional Output Targets

摘要：

　　Training a large probabilistic neural network language model,with typical high-dimensional output is excessively timeconsuming,which is one of the main reasons that more simplified models such as n-gram is often more popular despite the inferior performance.In this paper a Chinese neural probabilistic language model is trained using the Fudan Chinese Language Corpus.As hundreds of thousands of distinct words have been tokenized from the raw corpus,the model contains tens of millions of parameters.To address the challenge,popular parallel computing platform MPI(Message Passing Interface)based on cluster is employed to implement the parallel neural network language model.Specifically,we propose a new method termed as Parallel Randomized Block Coordinate Descent(PRBCD)to train this model cost-effectively.Different from traditional coordinate descent method,our new method could be employed in network with multiple layers,allowing scaling up the gradients with respect to hidden units proportionally based on sampled parameters.We empirically show that our PRBCD is stable and is well suited for language models,which contain only a few layers while often have a large amount of parameters and extremely high-dimensional output targets.

关键词： Language model Stochastic optimization Parallel computing

作者: Xin Liu Junchi Yan Xiangfeng Wang Hongyuan Zha

作者单位: East China Normal University,Shanghai,China East China Normal University,Shanghai,China;IBM Research - China,Shanghai,China East China Normal University,Shanghai,China;Georgia Institute of Technology,Atlanta,USA

会议类型: 国际会议

会议名称: 第七届全国模式识别学术会议(The 7th Chinese Conference on Pattern Recognition,CCPR2016)

会议地点: 成都

会议语种:英文

页码: 334-348

在线出版日期: 2016-11-03（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Parallel Randomized Block Coordinate Descent for Neural Probabilistic Language Model with High-Dimensional Output Targets