A Non-myopic Approach Based on Reinforcement Learning for Multiple Moving Targets Search
Myopic information-based approaches maximizing information gain for single one observation opportunity are effective to search for multiple moving targets in ocean surveillance by space-based sensors. A non-myopic approach based on reinforcement learning is developed in order to maximize information gain for the long term. Reinforcement learning adjusts optimal control policy and learns system behaviors through trial-and-error experience from interactions with a dynamic environment. System states are characterized by the expected information gain, action-value functions are estimated by online SARAR (lambda) algorithm and parameterized control policy is approximated by neural networks. Finally, simulations show that non-myopic approach after sufficient training can provide better performance than myopic approach.
optimal search theory reinforcement learning multiple moving targets maritime surveillance satellite
Yifan Xu Yuejin Tan Zhenyu Lian Renjie He
College of Information System and Management National University of Defense Technology Changsha,Hunan 410073,PRC
国际会议
2010 IEEE信息与自动化国际会议(ICIA 2010)
哈尔滨
英文
1-6
2010-06-20(万方平台首次上网日期,不代表论文的发表时间)