Title:
|
Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach (English) |
Author:
|
Ortega-Gutiérrez, R. Israel |
Author:
|
Montes-de-Oca, Raúl |
Author:
|
Lemus-Rodríguez, Enrique |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 (print) |
ISSN:
|
1805-949X (online) |
Volume:
|
52 |
Issue:
|
1 |
Year:
|
2016 |
Pages:
|
66-75 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs. (English) |
Keyword:
|
discounted Markov decision processes |
Keyword:
|
dynamic programming |
Keyword:
|
unique optimal policy |
Keyword:
|
non-uniqueness of optimal policies |
Keyword:
|
Ekeland's variational principle |
MSC:
|
90C40 |
MSC:
|
93E20 |
idZBL:
|
Zbl 1374.90407 |
idMR:
|
MR3482611 |
DOI:
|
10.14736/kyb-2016-1-0066 |
. |
Date available:
|
2016-03-21T17:51:53Z |
Last updated:
|
2018-01-10 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/144863 |
. |
Reference:
|
[1] Bertsekas, D. P.: Dynamic Programming: Deterministic and Stochastic Models..Prentice-Hall, NJ 1987. Zbl 0649.93001, MR 0896902 |
Reference:
|
[2] Bishop, E., Phelps, R. R.: The support functionals of a convex set..In: Proc. Sympos. Pure Math. Vol. VII, 1963 (V. L. Klee, ed.), Amer. Math. Soc., pp. 27-35. Zbl 0149.08601, MR 0154092, 10.1090/pspum/007/0154092 |
Reference:
|
[3] Borwein, J. M., Zhu, Q. J.: Techniques of Variational Analysis..Springer, New York 2005. Zbl 1076.49001, MR 2144010 |
Reference:
|
[4] Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F.: Conditions for the uniqueness of optimal policies of discounted Markov decision processes..Math. Methods Oper. Res. 60 (2004), 415-436. Zbl 1104.90053, MR 2106092, 10.1007/s001860400372 |
Reference:
|
[5] Cruz-Suárez, D., Montes-de-Oca, R.: Uniform convergence of the value iteration policies for discounted Markov decision processes..Bol. Soc. Mat. Mexicana 12 (2006), 133-152. MR 2301750 |
Reference:
|
[6] Ekeland, I.: On the variational principle..J. Math. Anal. Appl. 67 (1974), 324-353. Zbl 0286.49015, MR 0346619, 10.1016/0022-247x(74)90025-0 |
Reference:
|
[7] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria..Springer-Verlag, New York 1996. Zbl 0840.93001, MR 1363487, 10.1007/978-1-4612-0729-0 |
Reference:
|
[8] Lucchetti, R.: Convexity and Well-Posed Problems..CMS Books in Mathematics, Springer, New York 2006. Zbl 1106.49001, MR 2179578, 10.1007/0-387-31082-7 |
Reference:
|
[9] Montes-de-Oca, R., Lemus-Rodríguez, E.: An unbounded Berge's minimum theorem with applications to discounted Markov decision processes..Kybernetika 48 (2012), 268-286. Zbl 1275.90124, MR 2954325 |
Reference:
|
[10] Montes-de-Oca, R., Lemus-Rodríguez, E., Salem-Silva, F.: Nonuniqueness versus uniqueness of optimal policies in convex discounted Markov decision processes..J. Appl. Math. 2013 (2013), 1-5. Zbl 1266.90113, MR 3039713, 10.1155/2013/271279 |
Reference:
|
[11] Rockafellar, R. T., Wets, R. J. B.: Variational Analysis..Springer, New York 2004. Zbl 0888.49001, MR 1491362 |
Reference:
|
[12] Tanaka, K., Hosino, M., Kuroiwa, D.: On an $\varepsilon $-optimal policy of discrete time stochastic control processes..Bull. Inform. Cybernet. 27 (1995), 107-119. MR 1335274 |
. |