Previous |  Up |  Next

Article

Title: Identification of optimal policies in Markov decision processes (English)
Author: Sladký, Karel
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 46
Issue: 3
Year: 2010
Pages: 558-570
Summary lang: English
.
Category: math
.
Summary: In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the modified value iterations it is possible to eliminate suboptimal actions and to identify an optimal policy or nearly optimal policies in a finite number of steps without knowing precise values of the performance function. (English)
Keyword: finite state Markov decision processes
Keyword: discounted and average costs
Keyword: elimination of suboptimal policies
MSC: 60J10
MSC: 90C40
MSC: 93E20
idZBL: Zbl 1195.93148
idMR: MR2676091
.
Date available: 2010-09-13T17:05:32Z
Last updated: 2013-09-21
Stable URL: http://hdl.handle.net/10338.dmlcz/140769
.
Reference: [1] Cruz-Suárez, D., Montes-de-Oca, R.: Uniform convergence of the value iteration policies for discounted Markov decision processes.Bol. de la Soc. Mat. Mexicana 12 (2006), 133–148. MR 2301750
Reference: [2] Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F.: Uniform approximations of discounted Markov decision processes to optimal policies.Proceedings of Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), Matfyzpress, Prague 2006, pp. 278–287.
Reference: [3] Grinold, J.: Elimination of suboptimal actions in Markov decision problems.Oper. Res. 21 (1973), 848–851. Zbl 0259.90053, MR 0359828, 10.1287/opre.21.3.848
Reference: [4] Hastings, N. A. J.: Bounds on the gain of a Markov decision processes.Oper. Res. 19 (1971), 240–243. 10.1287/opre.19.1.240
Reference: [5] Hastings, N. A. J., Mello, J.: Tests for suboptimal actions in discounted Markov programming.Manag. Sci. 19 (1971), 1019–1022. Zbl 0304.90116, MR 0334959, 10.1287/mnsc.19.9.1019
Reference: [6] Hastings, N. A. J., Mello, J.: Tests for suboptimal actions in undiscounted Markov decision chains.Manag. Sci. 23 (1976), 87–91. MR 0439034, 10.1287/mnsc.23.1.87
Reference: [7] MacQueen, J.: A modified dynamic programming method for Markov decision problems.J. Math. Anal. Appl. 14 (1966), 38–43. MR 0191707, 10.1016/0022-247X(66)90060-6
Reference: [8] MacQueen, J.: A test of suboptimal actions in Markovian decision problems.Oper. Res. 15 (1967), 559–561. 10.1287/opre.15.3.559
Reference: [9] Odoni, A. R.: On finding the maximal gain for Markov decision processes.Oper. Res. 17 (1969), 857–860. Zbl 0184.23202, MR 0284192, 10.1287/opre.17.5.857
Reference: [10] Puterman, M. L., Shin, M. C.: Modified policy iteration algorithms for discounted Markov decision problems.Manag. Sci. 24 (1978), 1127–1137. Zbl 0391.90093, MR 0521502, 10.1287/mnsc.24.11.1127
Reference: [11] Puterman, M. L., Shin, M. C.: Action elimination procedures for modified policy iteration algorithm.Oper. Res. 30 (1982), 301–318. MR 0653253, 10.1287/opre.30.2.301
Reference: [12] Puterman, M. L.: Markov Decision Processes – Discrete Stochastic Dynamic Programming.Wiley, New York 1994. Zbl 1184.90170, MR 1270015
Reference: [13] Sladký, K.: O metodě postupných aproximací pro nalezení optimálního řízení markovského řetězce (On successive approximation method for finding optimal control of a Markov chain).Kybernetika 4 (1969), 2, 167–176.
Reference: [14] White, D. J.: Dynamic programming, Markov chains and the method of successive approximation.J. Math. Anal. Appl. 6 (1963), 296–306. MR 0148480, 10.1016/0022-247X(63)90017-9
.

Files

Files Size Format View
Kybernetika_46-2010-3_19.pdf 451.8Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo