ACTIVE EXPLORATION PLANNING IN REINFORCEMENT LEARNING FOR INVERTED PENDULUM SYSTEM CONTROL

摘要：

Reinforcement learning method usually require that all actions be tried in all state infinitely often for convergence.Such algorithms are impractical to be applied to sophisticated systems due to its low learning efficiency. This paper analyses the problem of limit cycles exist in reinforcement learning for inverted pendulum system control and proposed Active exploration planning policy. The algorithm sufficiently makes use of characteristics, active detects limit cycles and plan exploration instead by random exploration. The algorithm action improved the learning efficiency by selecting sub-optimal control action and limiting the exploration to the controllable areas, which can make the number of trials not grow exponentially with the state space. Simulation results for the control of single and double inverted pendulum are presented to show effectiveness of the proposed algorithm.

关键词： Reinforcement learning inverted pendulum exploration policy

作者: YU ZHENG SI-WEI LUO ZI-ANG LV

作者单位: School of Computer and Information Technology, Jiaotong University, Beijing, 100044, China

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 2805-2809

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

ACTIVE EXPLORATION PLANNING IN REINFORCEMENT LEARNING FOR INVERTED PENDULUM SYSTEM CONTROL