Title:
|
Estimation and adaptive control of span-contracting Markov decision processes (English) |
Author:
|
Hübner, Gerhard |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 |
Volume:
|
27 |
Issue:
|
1 |
Year:
|
1991 |
Pages:
|
66-71 |
. |
Category:
|
math |
. |
MSC:
|
90C40 |
idZBL:
|
Zbl 0744.90099 |
idMR:
|
MR1099515 |
. |
Date available:
|
2009-09-24T18:23:11Z |
Last updated:
|
2012-06-05 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/125653 |
. |
Reference:
|
[1] R. S. Acosta-Abreu, O. Hernandez-Lerma: Iterative adaptive control of denumerable state average-cost Markov systems.Control Cybernet. 14 (1985), 313 - 322. MR 0842780 |
Reference:
|
[2] V. V. Baranov: Recursive algorithms of adaptive control in stochastic systems.Cybernetics 17 (1981), 815-824. MR 0689427 |
Reference:
|
[3] A. Federgruen: Markovian Control Problems.Math. Centre Tracts 97, Amsterdam 1983. Zbl 0541.90068, MR 0745450 |
Reference:
|
[4] A. Federgruen, P. J. Schweitzer: Nonstationary Markov decision problems with converging parameters.J. optim. Theory Appl. 34 (1981), 207-241. Zbl 0426.90091, MR 0625228 |
Reference:
|
[5] A. Federgruen P. J. Schweitzer, H. C Tijms: Contraction mappings underlying undiscounted Markov decision problems.J. Math. Anal. Appl. 65 (1978), 711 - 730. MR 0510481 |
Reference:
|
[6] A. Federgruen, H. C Tijms: The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms.J. Appl. Probab. 15 (1978), 356-373. Zbl 0386.90060, MR 0475896 |
Reference:
|
[7] O. Hernandez-Lerma: Adaptive Control Processes.Springer-Verlag, Berlin-Heidelberg- New York 1989. MR 0995463 |
Reference:
|
[8] K. Hinderer: On approximate solutions of finite-stage dynamic programs.In: Dynamic Programming and its applications (M. L. Puterman, ed.), Academic Press, New York 1978, pp. 289-317. Zbl 0461.90075, MR 0537885 |
Reference:
|
[9] G. Hiibner: Contraction properties of Markov decision models with applications to the elimination of non-optimal actions.In: Dynamische optimierung, Bonner Math. Schriften 98 (1977), 57-65. MR 0524411 |
Reference:
|
[10] G. Hiibner: A unified approach to adaptive control of average reward Markov decision processes.OR Spektrum 10 (1988), 161-166. MR 0961229 |
Reference:
|
[11] M. Kurano: Discrete-time Markovian decision processes with an unknown parameter - average return criterion.J. oper. Res. Soc. Japan 15 (1972), 67-76. Zbl 0238.90006, MR 0343942 |
Reference:
|
[12] M. Kurano: Adaptive policies in Markov decision processes with uncertain matrices.J. Inf. Optim. 4 (1983), 21-40. MR 0697991 |
Reference:
|
[13] M. Kurano: Learning algorithms for Markov decision processes.J. Appl. Probab. 24 (1987), 270-276. Zbl 0631.90085, MR 0876190 |
Reference:
|
[14] P. Mandl: Estimation and control of Markov chains.Adv. in Appl. Probab. 6 (1974), 40-60. MR 0339876 |
Reference:
|
[15] P. Mandl: On the adaptive control of countable Markov chains.In: Probability Theory, Banach Centre Publications, Warsaw 1979, pp. 159-173. Zbl 0439.60069, MR 0561478 |
Reference:
|
[16] W. Whitt: Approximations of dynamic programs.Math. Oper. Res. 3 (1978), 231 - 243. Zbl 0393.90094, MR 0506661 |
. |