An Average Reward Performance Potential Estimation with Geometric Variance Reduction

摘要：

Performance potential plays an important role in the Markov decision process (MDP) with the discounted- or averagereward criteria. With performance potential as building block, optimization algorithms, such as, policy iteration algorithms and gradient-based algorithms can be developed. Generally, performance potential can be obtained by solving linear equation. However, when state space is very large or transition probabilities are unknown, the solution of performance potential becomes difficult, even impossible. At that cases, the simulation-based estimation is more suitable. Regular Monte Carlo estimates have a variance of O(1/N), where N is the number of sample pathes of the Markov chains. In this paper, we consider a new estimation algorithm of average reward performance potential with geometric variance reduction. The estimates with geometric variance reduction O(ρN) with ρ < 1 have better convergence rate. By using the relative difference of performance potential, i.e., perturbation realization factor, performance potential can be estimated based on a coupling method, which can further reduce the variance of estimation. The estimation of performance potential in this paper can be applied in the event-based optimization.

关键词： Performance Potential Perturbation Realization Factor Estimation with Geometric Variance Reduction

作者: LI Yanjie

作者单位: Harbin Institute of Technology Shenzhen Graduate School Shenzhen, 518055, P. R. China

会议类型: 国际会议

会议名称: The 31st Chinese Control Conference(第三十一届中国控制会议)

会议地点: 合肥

会议语种:英文

页码: 2061-2065

在线出版日期: 2012-07-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Average Reward Performance Potential Estimation with Geometric Variance Reduction