Policy Iteration for Bounded-Parameter POMDPs
POMDP is considered as a basic model for decision making under uncertainty.As a generalization of the exact POMDP model,the boundedparameter POMDP (BPOMDP) provides only upper and lower bounds on the state-transition probabilities,observation probabilities and rewards,which is particularly suitable for characterizing the situations where the underlying model is imprecisely given or time-varying.This paper presents the optimistic criterion for optimality for solving BPOMDPs,under which the optimistically optimal value function is defined.By representing a policy explicitly as a finite-state controller,we propose a policy iteration approach that is shown to converge to an e-optimal policy under the optimistic optimality criterion.
Decision making under uncertainty Bounded-parameter POMDP Policy iteration Optimistic optimality Finite-state controller e-optimal policy
Yaodong Ni ZhiQiang Liu
School of Information Technology and Management University of International Business and Economics B School of Creative Media,City University of Hong Kong Hong Kong,China
国内会议
开封
英文
11-25
2012-05-25(万方平台首次上网日期,不代表论文的发表时间)