Title:
|
Growth rates and average optimality in risk-sensitive Markov decision chains (English) |
Author:
|
Sladký, Karel |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 |
Volume:
|
44 |
Issue:
|
2 |
Year:
|
2008 |
Pages:
|
205-226 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent. (English) |
Keyword:
|
risk-sensitive Markov decision chains |
Keyword:
|
average optimal policies |
Keyword:
|
optimal growth rates |
MSC:
|
60J10 |
MSC:
|
90C39 |
MSC:
|
90C40 |
MSC:
|
93E20 |
idZBL:
|
Zbl 1154.90612 |
idMR:
|
MR2428220 |
. |
Date available:
|
2009-09-24T20:33:32Z |
Last updated:
|
2012-06-06 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/135844 |
. |
Reference:
|
[1] Bellman R.: On a quasi-linear equation.Canadian J. Math. 8 (1956), 198–202 Zbl 0071.34902, MR 0077782 |
Reference:
|
[2] Bellman R.: Dynamic Programming.Princenton Univ. Press, Princenton, N.J. 1957 Zbl 1205.90002, MR 0090477 |
Reference:
|
[3] Berman A., Plemmons R. J.: Nonnegative Matrices in the Mathematical Sciences.Academic Press, New York 1979 Zbl 0815.15016, MR 0544666 |
Reference:
|
[4] Bielecki T. D., Hernández-Hernández, D., Pliska S. R.: Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management.Math. Methods Oper. Res. 50 (1999), 167–188 MR 1732397 |
Reference:
|
[5] Cavazos-Cadena R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space.Math. Methods Oper. Res. 57 (2003), 253–285 Zbl 1023.90076, MR 1973378 |
Reference:
|
[6] Cavazos-Cadena R., Montes-de-Oca R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space.Math. Oper. Res. 28 (2003), 752–756 Zbl 1082.90125, MR 2015911 |
Reference:
|
[7] Cavazos-Cadena R., Montes-de-Oca R.: Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion.J. Appl. Probab. 42 (2005), 905–918 Zbl 1105.90101, MR 2203811 |
Reference:
|
[8] Cavazos-Cadena R., Hernández-Hernández D.: Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach.Math. Methods Oper. Res. 56 (2002), 473–479 MR 1953028 |
Reference:
|
[9] Cavazos-Cadena R., Hernández-Hernández D.: A characterization exponential functionals in finite Markov chains.Math. Methods Oper. Res. 60 (2004), 399–414 MR 2106091 |
Reference:
|
[10] Cavazos-Cadena R., Hernández-Hernández D.: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains.Ann. Appl. Probab. 15 (2005), 175–212 Zbl 1076.93045, MR 2115041 |
Reference:
|
[11] Gantmakher F. R.: The Theory of Matrices.Chelsea, London 1959 |
Reference:
|
[12] Howard R. A., Matheson J.: Risk-sensitive Markov decision processes.Manag. Sci. 23 (1972), 356–369 Zbl 0238.90007, MR 0292497 |
Reference:
|
[13] Jaquette S. A.: A utility criterion for Markov decision processes.Manag. Sci. 23 (1976), 43–49 Zbl 0337.90053, MR 0439037 |
Reference:
|
[14] Mandl P.: Decomposable non-negative matrices in a dynamic programming problem.Czechoslovak Math. J. 20 (1970), 504–510 Zbl 0209.22902, MR 0264978 |
Reference:
|
[15] Rothblum U. G., Whittle P.: Growth optimality for branching Markov decision chains.Math. Oper. Res. 7 (1982), 582–601 Zbl 0498.90082, MR 0686533 |
Reference:
|
[16] Sladký K.: On dynamic programming recursions for multiplicative Markov decision chains.Math. Programming Study 6 (1976), 216–226 Zbl 0374.60089, MR 0452725 |
Reference:
|
[17] Sladký K.: Successive approximation methods for dynamic programming models.In: Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.). Academia, Prague 1979, pp. 171–189 Zbl 0496.90081 |
Reference:
|
[18] Sladký K.: Bounds on discrete dynamic programming recursions I.Kybernetika 16 (1980), 526–547 Zbl 0454.90085, MR 0607292 |
Reference:
|
[19] Whittle P.: Optimization Over Time – Dynamic Programming and Stochastic Control.Volume II, Chapter 35, Wiley, Chichester 1983 MR 0710833 |
Reference:
|
[20] Zijm W. H. M.: Nonnegative Matrices in Dynamic Programming.Mathematical Centre Tract, Amsterdam 1983 Zbl 0526.90059, MR 0723868 |
. |