INCREMENTAL LEAST SQUARES POLICY ITERATION IN REINFORCEMENT LEARNING FOR CONTROL

摘要：

We propose a novel algorithm of reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This algorithm improves least-squares policy iteration (LSPI) methods by using incremental least-squares temporal-difference learning algorithm (iLSTD) for prediction problems. We show that the novel algorithm has less computing complexities than LSPI, and has the same performance as LSPI in learning optimal policies.

关键词： Linear function approzimation policy evaluation policy iteration least-squares methods incremental updating

作者: CHUN-GUI LI MENG WANG SHU-HONG YANG

作者单位: Department of Computer Engineering, Guangxi University of Technology, Liuzhou 545006, China

会议类型: 国际会议

会议名称: 2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

会议地点: 昆明

会议语种:英文

页码: 2010-2014

在线出版日期: 2008-07-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

INCREMENTAL LEAST SQUARES POLICY ITERATION IN REINFORCEMENT LEARNING FOR CONTROL