Automatic Construct of Options in RL
The taboo state is introduced in environment to discovery sub-goal. Agent samples trajectories from starting state to goal state, which contain different bottlenecks. Then the different tasks are submitted to agent.According to whether the task is accomplished or not, the bottlenecks among them are discovered. The appropriate bottlenecks are selected as sub-goal of options to be constructed according to the adjacent relationship among them. Simultaneously agent can obtain the initial set and the policies of options. Grid-world tasks illustrate that the agent can automatically construct useful options online with the proposed method, which have capability of accelerating learning and the transference of knowledge among those similar learning tasks.
hierarchical reinforcement learning option subgoal taboo search Q-learning.
Xu Ming-Liang Sun Jun Xu Wen-bo
School of Information Technology,Jiangnan University Wuxi 214122,China
国际会议
上海
英文
47-50
2009-11-20(万方平台首次上网日期,不代表论文的发表时间)