Policy Iteration for Bounded-Parameter POMDPs

摘要：

　　POMDP is considered as a basic model for decision making under uncertainty.As a generalization of the exact POMDP model,the boundedparameter POMDP (BPOMDP) provides only upper and lower bounds on the state-transition probabilities,observation probabilities and rewards,which is particularly suitable for characterizing the situations where the underlying model is imprecisely given or time-varying.This paper presents the optimistic criterion for optimality for solving BPOMDPs,under which the optimistically optimal value function is defined.By representing a policy explicitly as a finite-state controller,we propose a policy iteration approach that is shown to converge to an e-optimal policy under the optimistic optimality criterion.

关键词： Decision making under uncertainty Bounded-parameter POMDP Policy iteration Optimistic optimality Finite-state controller e-optimal policy

作者: Yaodong Ni ZhiQiang Liu

作者单位: School of Information Technology and Management University of International Business and Economics B School of Creative Media,City University of Hong Kong Hong Kong,China

会议类型: 国内会议

会议名称: 第六届中国智能计算大会

会议地点: 开封

会议语种:英文

页码: 11-25

在线出版日期: 2012-05-25（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Policy Iteration for Bounded-Parameter POMDPs