An Average Reward Performance Potential Estimation with Geometric Variance Reduction
Performance potential plays an important role in the Markov decision process (MDP) with the discounted- or averagereward criteria. With performance potential as building block, optimization algorithms, such as, policy iteration algorithms and gradient-based algorithms can be developed. Generally, performance potential can be obtained by solving linear equation. However, when state space is very large or transition probabilities are unknown, the solution of performance potential becomes difficult, even impossible. At that cases, the simulation-based estimation is more suitable. Regular Monte Carlo estimates have a variance of O(1/N), where N is the number of sample pathes of the Markov chains. In this paper, we consider a new estimation algorithm of average reward performance potential with geometric variance reduction. The estimates with geometric variance reduction O(ρN) with ρ < 1 have better convergence rate. By using the relative difference of performance potential, i.e., perturbation realization factor, performance potential can be estimated based on a coupling method, which can further reduce the variance of estimation. The estimation of performance potential in this paper can be applied in the event-based optimization.
Performance Potential Perturbation Realization Factor Estimation with Geometric Variance Reduction
LI Yanjie
Harbin Institute of Technology Shenzhen Graduate School Shenzhen, 518055, P. R. China
国际会议
The 31st Chinese Control Conference(第三十一届中国控制会议)
合肥
英文
2061-2065
2012-07-01(万方平台首次上网日期,不代表论文的发表时间)