[1] Cruz-Suárez, D., Montes-de-Oca, R.:
Uniform convergence of the value iteration policies for discounted Markov decision processes. Bol. de la Soc. Mat. Mexicana 12 (2006), 133–148.
MR 2301750
[2] Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F.: Uniform approximations of discounted Markov decision processes to optimal policies. Proceedings of Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), Matfyzpress, Prague 2006, pp. 278–287.
[4] Hastings, N. A. J.:
Bounds on the gain of a Markov decision processes. Oper. Res. 19 (1971), 240–243.
DOI 10.1287/opre.19.1.240
[6] Hastings, N. A. J., Mello, J.:
Tests for suboptimal actions in undiscounted Markov decision chains. Manag. Sci. 23 (1976), 87–91.
DOI 10.1287/mnsc.23.1.87 |
MR 0439034
[8] MacQueen, J.:
A test of suboptimal actions in Markovian decision problems. Oper. Res. 15 (1967), 559–561.
DOI 10.1287/opre.15.3.559
[11] Puterman, M. L., Shin, M. C.:
Action elimination procedures for modified policy iteration algorithm. Oper. Res. 30 (1982), 301–318.
DOI 10.1287/opre.30.2.301 |
MR 0653253
[12] Puterman, M. L.:
Markov Decision Processes – Discrete Stochastic Dynamic Programming. Wiley, New York 1994.
MR 1270015 |
Zbl 1184.90170
[13] Sladký, K.: O metodě postupných aproximací pro nalezení optimálního řízení markovského řetězce (On successive approximation method for finding optimal control of a Markov chain). Kybernetika 4 (1969), 2, 167–176.