会议专题

Convergence Accelerated By the Improvements of Stepsize and Gradient in SPSA

The simultaneous perturbation stochastic approximation (SPSA) is effective for the optimization problem of complex system which is difficult or impossible to directly obtain the gradient of the objective function except the measurements of objective function. SPSA relies on measurements of the objective function to estimate the gradient efficiently. In order to accelerate the convergence of SPSA, many improvements are proposed. The typical improvement is that the Newton-Raphson gradient approximation approach replaces first order gradient approximation of standard SPSA. Although the second order SPSA (2SPSA) algorithm solves the optimization problem successfully by efficient gradient approximation, the accuracy of the algorithm depends on the matrix conditioning of the objective function Hessian. In order to eliminate the influence caused by the objective function Hessian, this paper uses nonlinear conjugate gradient method to decide the search direction of the objective function. By synthesizing different nonlinear conjugate gradient methods, it ensures each search direction to be descensive. Besides the search direction improvement, this paper also improves the stepsize calculation method of SPSA. It calculates suitable stepsize based on the current and former gradient information. With the descensive search direction and appropriate stepsize, the improved SPSA converges faster than the 2SPSA. Through applying to reinforcement learning, the virtues of the improved SPSA are validated.

SPSA motion control Newton–Raphson conjugate gradient

Zhang Huajun Zhao Jin Geng Tao

Department of Control Science and Engineering, Huazhong University of Science and Technology (HUST), Wuhan, China

国际会议

2011 China Control and Decision Conference(2011中国控制与决策会议 CCDC)

四川绵阳

英文

1-6

2011-05-23(万方平台首次上网日期,不代表论文的发表时间)