会议专题

Policy Iteration for Bounded-Parameter POMDPs

  POMDP is considered as a basic model for decision making under uncertainty.As a generalization of the exact POMDP model,the boundedparameter POMDP (BPOMDP) provides only upper and lower bounds on the state-transition probabilities,observation probabilities and rewards,which is particularly suitable for characterizing the situations where the underlying model is imprecisely given or time-varying.This paper presents the optimistic criterion for optimality for solving BPOMDPs,under which the optimistically optimal value function is defined.By representing a policy explicitly as a finite-state controller,we propose a policy iteration approach that is shown to converge to an e-optimal policy under the optimistic optimality criterion.

Decision making under uncertainty Bounded-parameter POMDP Policy iteration Optimistic optimality Finite-state controller e-optimal policy

Yaodong Ni ZhiQiang Liu

School of Information Technology and Management University of International Business and Economics B School of Creative Media,City University of Hong Kong Hong Kong,China

国内会议

第六届中国智能计算大会

开封

英文

11-25

2012-05-25(万方平台首次上网日期,不代表论文的发表时间)