Previous |  Up |  Next

Article

Title: Growth rates and average optimality in risk-sensitive Markov decision chains (English)
Author: Sladký, Karel
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 44
Issue: 2
Year: 2008
Pages: 205-226
Summary lang: English
.
Category: math
.
Summary: In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent. (English)
Keyword: risk-sensitive Markov decision chains
Keyword: average optimal policies
Keyword: optimal growth rates
MSC: 60J10
MSC: 90C39
MSC: 90C40
MSC: 93E20
idZBL: Zbl 1154.90612
idMR: MR2428220
.
Date available: 2009-09-24T20:33:32Z
Last updated: 2012-06-06
Stable URL: http://hdl.handle.net/10338.dmlcz/135844
.
Reference: [1] Bellman R.: On a quasi-linear equation.Canadian J. Math. 8 (1956), 198–202 Zbl 0071.34902, MR 0077782
Reference: [2] Bellman R.: Dynamic Programming.Princenton Univ. Press, Princenton, N.J. 1957 Zbl 1205.90002, MR 0090477
Reference: [3] Berman A., Plemmons R. J.: Nonnegative Matrices in the Mathematical Sciences.Academic Press, New York 1979 Zbl 0815.15016, MR 0544666
Reference: [4] Bielecki T. D., Hernández-Hernández, D., Pliska S. R.: Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management.Math. Methods Oper. Res. 50 (1999), 167–188 MR 1732397
Reference: [5] Cavazos-Cadena R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space.Math. Methods Oper. Res. 57 (2003), 253–285 Zbl 1023.90076, MR 1973378
Reference: [6] Cavazos-Cadena R., Montes-de-Oca R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space.Math. Oper. Res. 28 (2003), 752–756 Zbl 1082.90125, MR 2015911
Reference: [7] Cavazos-Cadena R., Montes-de-Oca R.: Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion.J. Appl. Probab. 42 (2005), 905–918 Zbl 1105.90101, MR 2203811
Reference: [8] Cavazos-Cadena R., Hernández-Hernández D.: Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach.Math. Methods Oper. Res. 56 (2002), 473–479 MR 1953028
Reference: [9] Cavazos-Cadena R., Hernández-Hernández D.: A characterization exponential functionals in finite Markov chains.Math. Methods Oper. Res. 60 (2004), 399–414 MR 2106091
Reference: [10] Cavazos-Cadena R., Hernández-Hernández D.: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains.Ann. Appl. Probab. 15 (2005), 175–212 Zbl 1076.93045, MR 2115041
Reference: [11] Gantmakher F. R.: The Theory of Matrices.Chelsea, London 1959
Reference: [12] Howard R. A., Matheson J.: Risk-sensitive Markov decision processes.Manag. Sci. 23 (1972), 356–369 Zbl 0238.90007, MR 0292497
Reference: [13] Jaquette S. A.: A utility criterion for Markov decision processes.Manag. Sci. 23 (1976), 43–49 Zbl 0337.90053, MR 0439037
Reference: [14] Mandl P.: Decomposable non-negative matrices in a dynamic programming problem.Czechoslovak Math. J. 20 (1970), 504–510 Zbl 0209.22902, MR 0264978
Reference: [15] Rothblum U. G., Whittle P.: Growth optimality for branching Markov decision chains.Math. Oper. Res. 7 (1982), 582–601 Zbl 0498.90082, MR 0686533
Reference: [16] Sladký K.: On dynamic programming recursions for multiplicative Markov decision chains.Math. Programming Study 6 (1976), 216–226 Zbl 0374.60089, MR 0452725
Reference: [17] Sladký K.: Successive approximation methods for dynamic programming models.In: Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.). Academia, Prague 1979, pp. 171–189 Zbl 0496.90081
Reference: [18] Sladký K.: Bounds on discrete dynamic programming recursions I.Kybernetika 16 (1980), 526–547 Zbl 0454.90085, MR 0607292
Reference: [19] Whittle P.: Optimization Over Time – Dynamic Programming and Stochastic Control.Volume II, Chapter 35, Wiley, Chichester 1983 MR 0710833
Reference: [20] Zijm W. H. M.: Nonnegative Matrices in Dynamic Programming.Mathematical Centre Tract, Amsterdam 1983 Zbl 0526.90059, MR 0723868
.

Files

Files Size Format View
Kybernetika_44-2008-2_6.pdf 988.2Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo