Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation

Flores-Hernández, Rosa María

About DML-CZ | FAQ | Conditions of Use | Math Archives | Contact Us

Previous | Up | Next

Article

Title:	Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation (English)
Author:	Flores-Hernández, Rosa María
Language:	English
Journal:	Kybernetika
ISSN:	0023-5954
Volume:	49
Issue:	5
Year:	2013
Pages:	705-719
Summary lang:	English
.
Category:	math
.
Summary:	In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given. (English)
Keyword:	Markov decision process
Keyword:	total discounted cost
Keyword:	total discounted reward
Keyword:	increasing optimal policy
Keyword:	decreasing optimal policy
Keyword:	policy iteration algorithm
MSC:	90C40
MSC:	93E20
idZBL:	Zbl 1278.90425
idMR:	MR3182635
.
Date available:	2013-11-27T09:44:41Z
Last updated:	2015-03-29
Stable URL:	http://hdl.handle.net/10338.dmlcz/143520
.
Reference:	[1] Assaf, D.: Invariant problems in discounted dynamic programming..Adv. in Appl. Probab. 10 (1978), 472-490. Zbl 0388.49016, MR 0489919, 10.2307/1426946
Reference:	[2] Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance..Springer-Verlag, Berlin - Heidelberg 2011. Zbl 1236.90004, MR 2808878
Reference:	[3] Bertsekas, D. P.: Dynamic Programming: Deterministic and Stochastic Models..Prentice Hall, New Jersey 1987. Zbl 0649.93001, MR 0896902
Reference:	[4] Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F.: Conditions for the uniqueness of optimal policies of discounted Markov decision processes..Math. Methods Oper. Res. 60 (2004), 415-436. Zbl 1104.90053, MR 2106092, 10.1007/s001860400372
Reference:	[5] Dragut, A.: Structured optimal policies for Markov decision processes: lattice programming techniques..In: Wiley Encyclopedia of Operations Research and Management Science (J. J. Cochran, ed.), John Wiley and Sons, 2010, pp. 1-25.
Reference:	[6] Duffie, D.: Security Markets..Academic Press, San Diego 1988. Zbl 0861.90019, MR 0955269
Reference:	[7] Flores-Hernández, R. M., Montes-de-Oca, R.: Monotonicity of minimizers in optimization problems with applications to Markov control processes..Kybernetika 43 (2007), 347-368. Zbl 1170.90513, MR 2362724
Reference:	[8] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria..Springer-Verlag, New York 1996. Zbl 0840.93001, MR 1363487
Reference:	[9] Heyman, D. P., Sobel, M. J.: Stochastic Models in Operations Research, Vol. II. Stochastic Optimization..McGraw-Hill, New York 1984. Zbl 0531.90062
Reference:	[10] Jaśkiewicz, A.: A note on risk-sensitive control of invariant models..Syst. Control Lett. 56 (2007), 663-668. Zbl 1120.49020, MR 2356450, 10.1016/j.sysconle.2007.06.006
Reference:	[11] Jaśkiewicz, A., Nowak, A. S.: Discounted dynamic programming with unbounded returns: application to economic models..J. Math. Anal. Appl. 378 (2011), 450-462. Zbl 1254.90292, MR 2773257, 10.1016/j.jmaa.2010.08.073
Reference:	[12] Mendelssohn, R., Sobel, M. J.: Capital accumulation and the optimization of renewable resource models..J. Econom. Theory 23 (1980), 243-260. Zbl 0472.90015, 10.1016/0022-0531(80)90009-5
Reference:	[13] Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming..John Wiley and Sons, New York 1994. Zbl 1184.90170, MR 1270015
Reference:	[14] Topkis, D. M.: Supermodularity and Complementarity..Princeton University Press, Princeton, New Jersey 1998. MR 1614637
.

Files

Files	Size	Format	View
Kybernetika_49-2013-5_3.pdf	324.6Kb	application/pdf	View/Open

Back to standard record

Browse
- Collections
- Titles
- Authors
- MSC

About DML-CZ

Partner of

Article

Files

Search

Browse