Previous |  Up |  Next

Article

Keywords:
Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm
Summary:
In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.
References:
[1] Assaf, D.: Invariant problems in discounted dynamic programming. Adv. in Appl. Probab. 10 (1978), 472-490. DOI 10.2307/1426946 | MR 0489919 | Zbl 0388.49016
[2] Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer-Verlag, Berlin - Heidelberg 2011. MR 2808878 | Zbl 1236.90004
[3] Bertsekas, D. P.: Dynamic Programming: Deterministic and Stochastic Models. Prentice Hall, New Jersey 1987. MR 0896902 | Zbl 0649.93001
[4] Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F.: Conditions for the uniqueness of optimal policies of discounted Markov decision processes. Math. Methods Oper. Res. 60 (2004), 415-436. DOI 10.1007/s001860400372 | MR 2106092 | Zbl 1104.90053
[5] Dragut, A.: Structured optimal policies for Markov decision processes: lattice programming techniques. In: Wiley Encyclopedia of Operations Research and Management Science (J. J. Cochran, ed.), John Wiley and Sons, 2010, pp. 1-25.
[6] Duffie, D.: Security Markets. Academic Press, San Diego 1988. MR 0955269 | Zbl 0861.90019
[7] Flores-Hernández, R. M., Montes-de-Oca, R.: Monotonicity of minimizers in optimization problems with applications to Markov control processes. Kybernetika 43 (2007), 347-368. MR 2362724 | Zbl 1170.90513
[8] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. MR 1363487 | Zbl 0840.93001
[9] Heyman, D. P., Sobel, M. J.: Stochastic Models in Operations Research, Vol. II. Stochastic Optimization. McGraw-Hill, New York 1984. Zbl 0531.90062
[10] Jaśkiewicz, A.: A note on risk-sensitive control of invariant models. Syst. Control Lett. 56 (2007), 663-668. DOI 10.1016/j.sysconle.2007.06.006 | MR 2356450 | Zbl 1120.49020
[11] Jaśkiewicz, A., Nowak, A. S.: Discounted dynamic programming with unbounded returns: application to economic models. J. Math. Anal. Appl. 378 (2011), 450-462. DOI 10.1016/j.jmaa.2010.08.073 | MR 2773257 | Zbl 1254.90292
[12] Mendelssohn, R., Sobel, M. J.: Capital accumulation and the optimization of renewable resource models. J. Econom. Theory 23 (1980), 243-260. DOI 10.1016/0022-0531(80)90009-5 | Zbl 0472.90015
[13] Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York 1994. MR 1270015 | Zbl 1184.90170
[14] Topkis, D. M.: Supermodularity and Complementarity. Princeton University Press, Princeton, New Jersey 1998. MR 1614637
Partner of
EuDML logo