Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach

Ortega-Gutiérrez, R. Israel; Montes-de-Oca, Raúl; Lemus-Rodríguez, Enrique

About DML-CZ | FAQ | Conditions of Use | Math Archives | Contact Us

Previous | Up | Next

Article

Ortega-Gutiérrez, R. Israel ; Montes-de-Oca, Raúl ; Lemus-Rodríguez, Enrique

Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach. (English). Kybernetika, vol. 52 (2016), issue 1, pp. 66-75

MSC: 90C40, 93E20 | MR 3482611 | Zbl 1374.90407 | DOI: 10.14736/kyb-2016-1-0066

Full entry |

PDF (0.3 MB) Feedback

Keywords:
discounted Markov decision processes; dynamic programming; unique optimal policy; non-uniqueness of optimal policies; Ekeland's variational principle

Summary:
Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.

Similar articles:

References:

[1] Bertsekas, D. P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, NJ 1987. MR 0896902 | Zbl 0649.93001

[2] Bishop, E., Phelps, R. R.: The support functionals of a convex set. In: Proc. Sympos. Pure Math. Vol. VII, 1963 (V. L. Klee, ed.), Amer. Math. Soc., pp. 27-35. DOI 10.1090/pspum/007/0154092 | MR 0154092 | Zbl 0149.08601

[3] Borwein, J. M., Zhu, Q. J.: Techniques of Variational Analysis. Springer, New York 2005. MR 2144010 | Zbl 1076.49001

[4] Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F.: Conditions for the uniqueness of optimal policies of discounted Markov decision processes. Math. Methods Oper. Res. 60 (2004), 415-436. DOI 10.1007/s001860400372 | MR 2106092 | Zbl 1104.90053

[5] Cruz-Suárez, D., Montes-de-Oca, R.: Uniform convergence of the value iteration policies for discounted Markov decision processes. Bol. Soc. Mat. Mexicana 12 (2006), 133-152. MR 2301750

[6] Ekeland, I.: On the variational principle. J. Math. Anal. Appl. 67 (1974), 324-353. DOI 10.1016/0022-247x(74)90025-0 | MR 0346619 | Zbl 0286.49015

[7] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. DOI 10.1007/978-1-4612-0729-0 | MR 1363487 | Zbl 0840.93001

[8] Lucchetti, R.: Convexity and Well-Posed Problems. CMS Books in Mathematics, Springer, New York 2006. DOI 10.1007/0-387-31082-7 | MR 2179578 | Zbl 1106.49001

[9] Montes-de-Oca, R., Lemus-Rodríguez, E.: An unbounded Berge's minimum theorem with applications to discounted Markov decision processes. Kybernetika 48 (2012), 268-286. MR 2954325 | Zbl 1275.90124

[10] Montes-de-Oca, R., Lemus-Rodríguez, E., Salem-Silva, F.: Nonuniqueness versus uniqueness of optimal policies in convex discounted Markov decision processes. J. Appl. Math. 2013 (2013), 1-5. DOI 10.1155/2013/271279 | MR 3039713 | Zbl 1266.90113

[11] Rockafellar, R. T., Wets, R. J. B.: Variational Analysis. Springer, New York 2004. MR 1491362 | Zbl 0888.49001

[12] Tanaka, K., Hosino, M., Kuroiwa, D.: On an $\varepsilon $-optimal policy of discrete time stochastic control processes. Bull. Inform. Cybernet. 27 (1995), 107-119. MR 1335274

Browse
- Collections
- Titles
- Authors
- MSC

About DML-CZ

Partner of

Article

Search

Browse