Previous |  Up |  Next

Article

Title: A stopping rule for discounted Markov decision processes with finite action sets (English)
Author: Montes-de-Oca, Raúl
Author: Lemus-Rodríguez, Enrique
Author: Cruz-Suárez, Daniel
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 45
Issue: 5
Year: 2009
Pages: 755-767
Summary lang: English
.
Category: math
.
Summary: In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound. (English)
Keyword: Markov decision process
Keyword: ergodicity condition
Keyword: value iteration
Keyword: discounted cost
Keyword: optimal policy
Keyword: myopic policies
MSC: 90C40
MSC: 93E20
idZBL: Zbl 1190.93107
idMR: MR2599110
.
Date available: 2010-06-02T19:12:27Z
Last updated: 2012-06-06
Stable URL: http://hdl.handle.net/10338.dmlcz/140043
.
Reference: [1] D. Cruz-Suárez and R. Montes-de-Oca: Uniform convergence of the value iteration policies for discounted Markov decision processes.Bol. Soc. Mat. Mexicana 12 (2006), 133–148. MR 2301750
Reference: [2] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva: Conditions for the uniqueness of discounted Markov decision processes.Math. Methods Oper. Res. 60 (2004), 415–436. MR 2106092
Reference: [3] D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva: Uniform approximations of discounted Markov decision processes to optimal policies.In: Proc. Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), MATFYZPRESS, Prague 2006, pp. 278–287.
Reference: [4] : .O. Hernández-Lerma: Adaptive Markov Control Processes Springer-Verlag, New York 1989. MR 0995463
Reference: [5] O. Hernández-Lerma and J. B. Lasserre: Discrete–Time Markov Control Processes: Basic Optimality Criteria.Springer-Verlag, New York 1996. MR 1363487
Reference: [6] O. Hernández-Lerma and J. B. Lasserre: Further Topics on Discrete–Time Markov Control Processes.Springer-Verlag, New York 1999. MR 1697198
Reference: [7] M. L. Puterman: Markov Decision Processes.Discrete Stochastic Dynamic Programming. Wiley, New York 1994. Zbl 1184.90170, MR 1270015
Reference: [8] R. Ritt and L. Sennott: Optimal stationary policies in general state Markov decision chains with finite action sets.Math. Oper. Res. 17 (1992), 901–909. MR 1196400
Reference: [9] N. L. Stokey and R. E. Lucas: Recursive Methods in Economic Dynamics.Harvard University Press, USA 1989. MR 1105087
.

Files

Files Size Format View
Kybernetika_45-2009-5_5.pdf 804.6Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo