Previous |  Up |  Next

Article

Title: Second Order optimality in Markov decision chains (English)
Author: Sladký, Karel
Language: English
Journal: Kybernetika
ISSN: 0023-5954 (print)
ISSN: 1805-949X (online)
Volume: 53
Issue: 6
Year: 2017
Pages: 1086-1099
Summary lang: English
.
Category: math
.
Summary: The article is devoted to Markov reward chains in discrete-time setting with finite state spaces. Unfortunately, the usual optimization criteria examined in the literature on Markov decision chains, such as a total discounted, total reward up to reaching some specific state (called the first passage models) or mean (average) reward optimality, may be quite insufficient to characterize the problem from the point of a decision maker. To this end it seems that it may be preferable if not necessary to select more sophisticated criteria that also reflect variability-risk features of the problem. Perhaps the best known approaches stem from the classical work of Markowitz on mean variance selection rules, i. e. we optimize the weighted sum of average or total reward and its variance. The article presents explicit formulae for calculating the variances for transient and discounted models (where the value of the discount factor depends on the current state and action taken) for finite and infinite time horizon. The same result is presented for the long run average nondiscounted models where finding stationary policies minimizing the average variance in the class of policies with a given long run average reward is discussed. (English)
Keyword: Markov decision chains
Keyword: second order optimality
Keyword: optimality conditions for transient
Keyword: discounted and average models
Keyword: policy iterations
Keyword: value iterations
MSC: 90C40
MSC: 93E20
idZBL: Zbl 06861642
idMR: MR3758936
DOI: 10.14736/kyb-2017-6-1086
.
Date available: 2018-02-26T11:30:06Z
Last updated: 2018-05-25
Stable URL: http://hdl.handle.net/10338.dmlcz/147086
.
Reference: [1] Feinberg, E. A., Fei, J.: Inequalities for variances of total discounted costs..J. Appl. Probab. 46 (2009), 1209-1212. MR 2582716, 10.1239/jap/1261670699
Reference: [2] Gantmakher, F. R.: The Theory of Matrices..Chelsea, London 1959. MR 0107649
Reference: [3] Jaquette, S. C.: Markov decision processes with a new optimality criterion: Discrete time..Ann. Statist. 1 (1973), 496-505. MR 0378839,
Reference: [4] Mandl, P.: On the variance in controlled Markov chains..Kybernetika 7 (1971), 1-12. Zbl 0215.25902, MR 0286178
Reference: [5] Markowitz, H.: Portfolio Selection - Efficient Diversification of Investments..Wiley, New York 1959. MR 0103768
Reference: [6] Puterman, M. L.: Markov Decision Processes - Discrete Stochastic Dynamic Programming..Wiley, New York 1994. MR 1270015
Reference: [7] Bäuerle, N., Rieder, U.: Markov Decision Processes with Application to Finance..Springer-Verlag, Berlin 2011. MR 2808878
Reference: [8] Righter, R.: Stochastic comparison of discounted rewards..J. Appl. Probab. 48 (2011), 293-294. MR 2809902,
Reference: [9] Sladký, K.: On mean reward variance in semi-Markov processes..Math. Meth. Oper. Res. 62 (2005), 387-397. MR 2229697,
Reference: [10] Sladký, K.: Risk-sensitive and mean variance optimality in Markov decision processes..Acta Oeconomica Pragensia 7 (2013), 146-161.
Reference: [11] Sladký, K.: Second order optimality in transient and discounted Markov decision chains..In: Proc. 33th Internat. Conf. Math. Methods in Economics MME 2015 (D. Martinčík, ed.), University of West Bohemia, Plzeň 2015, pp. 731-736.
Reference: [12] Sobel, M.: The variance of discounted Markov decision processes..J. Appl. Probab. 19 (1982), 794-802. Zbl 0503.90091, MR 0675143,
Reference: [13] Dijk, N. M. Van, Sladký, K.: On the total reward variance for continuous-time Markov reward chains..J. Appl. Probab. 43 (2006), 1044-1052. MR 2274635,
Reference: [14] Veinott, A. F., Jr: Discrete dynamic programming with sensitive discount optimality criteria..Ann. Math. Statist. 13 (1969), 1635-1660. MR 0256712,
Reference: [15] White, D. J.: Mean, variance and probability criteria in finite Markov decision processes: A review..J. Optimizat. Th. Appl. 56 (1988), 1-29. MR 0922375,
Reference: [16] Wu, X., Guo, X.: First passage optimality and variance minimisation of Markov decision processes with varying discount factors..J. Appl. Probab. 52 (2015), 441-456. Zbl 1327.90374, MR 3372085,
.

Files

Files Size Format View
Kybernetika_53-2017-6_8.pdf 325.7Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo