Previous |  Up |  Next

Article

Keywords:
Markov decision process; ergodicity condition; value iteration; discounted cost; optimal policy; myopic policies
Summary:
In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.
References:
[1] D. Cruz-Suárez and R. Montes-de-Oca: Uniform convergence of the value iteration policies for discounted Markov decision processes. Bol. Soc. Mat. Mexicana 12 (2006), 133–148. MR 2301750
[2] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva: Conditions for the uniqueness of discounted Markov decision processes. Math. Methods Oper. Res. 60 (2004), 415–436. MR 2106092
[3] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva: Uniform approximations of discounted Markov decision processes to optimal policies. In: Proc. Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), MATFYZPRESS, Prague 2006, pp. 278–287.
[4] O. Hernández-Lerma: Adaptive Markov Control Processes Springer-Verlag, New York 1989. MR 0995463
[5] O. Hernández-Lerma and J. B. Lasserre: Discrete–Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. MR 1363487
[6] O. Hernández-Lerma and J. B. Lasserre: Further Topics on Discrete–Time Markov Control Processes. Springer-Verlag, New York 1999. MR 1697198
[7] M. L. Puterman: Markov Decision Processes. Discrete Stochastic Dynamic Programming. Wiley, New York 1994. MR 1270015 | Zbl 1184.90170
[8] R. Ritt and L. Sennott: Optimal stationary policies in general state Markov decision chains with finite action sets. Math. Oper. Res. 17 (1992), 901–909. MR 1196400
[9] N. L. Stokey and R. E. Lucas: Recursive Methods in Economic Dynamics. Harvard University Press, USA 1989. MR 1105087
Partner of
EuDML logo