Title:
|
Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion (English) |
Author:
|
Gordienko, Evgueni I. |
Author:
|
Minjárez-Sosa, J. Adolfo |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 |
Volume:
|
34 |
Issue:
|
2 |
Year:
|
1998 |
Pages:
|
[217]-234 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t+1}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs. (English) |
Keyword:
|
Markov control process |
Keyword:
|
unbounded costs |
Keyword:
|
discounted asymptotic optimality |
Keyword:
|
density estimator |
Keyword:
|
rate of convergence |
MSC:
|
60J05 |
MSC:
|
62M05 |
MSC:
|
93C40 |
MSC:
|
93E35 |
idZBL:
|
Zbl 1274.90474 |
idMR:
|
MR1621512 |
. |
Date available:
|
2009-09-24T19:15:31Z |
Last updated:
|
2015-03-28 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/135201 |
. |
Reference:
|
[1] Agrawal R.: Minimizing the learning loss in adaptive control of Markov chains under the weak accessibility condition.J. Appl. Probab. 28 (1991), 779–790 Zbl 0741.60070, MR 1133786, 10.2307/3214681 |
Reference:
|
[2] Ash R. B.: Real Analysis and Probability.Academic Press, New York 1972 MR 0435320 |
Reference:
|
[3] Cavazos–Cadena R.: Nonparametric adaptive control of discounted stochastic system with compact state space.J. Optim. Theory Appl. 65 (1990), 191–207 MR 1051545, 10.1007/BF01102341 |
Reference:
|
[4] Dynkin E. B., A A.: Yushkevich: Controlled Markov Processes.Springer–Verlag, New York 1979 MR 0554083 |
Reference:
|
[5] Fernández–Gaucherand E., Arapostathis A., Marcus S. I.: A methodology for the adaptive control of Markov chains under partial state information.In: Proc. of the 1992 Conf. on Information Sci. and Systems, Princeton, New Jersey, pp. 773–775 |
Reference:
|
[6] Fernández–Gaucherand E., Arapostathis A., Marcus S. I.: Analysis of an adaptive control scheme for a partially observed controlled Markov chain.IEEE Trans. Automat. Control 38 (1993), 987–993 Zbl 0786.93089, MR 1227213, 10.1109/9.222316 |
Reference:
|
[7] Gordienko E. I.: Adaptive strategies for certain classes of controlled Markov processes.Theory Probab. Appl. 29 (1985), 504–518 Zbl 0577.93067 |
Reference:
|
[8] Gordienko E. I.: Controlled Markov sequences with slowly varying characteristics II.Adaptive optimal strategies. Soviet J. Comput. Systems Sci. 23 (1985), 87–93 Zbl 0618.93070, MR 0844298 |
Reference:
|
[9] Gordienko E. I., Hernández–Lerma O.: Average cost Markov control processes with weighted norms: value iteration.Appl. Math. 23 (1995), 219–237 Zbl 0829.93068, MR 1341224 |
Reference:
|
[10] Gordienko E. I., Montes–de–Oca R., Minjárez–Sosa J. A.: Approximation of average cost optimal policies for general Markov decision processes with unbounded costs.Math. Methods Oper. Res. 45 (1997), 2, to appear Zbl 0882.90127, MR 1446409, 10.1007/BF01193864 |
Reference:
|
[11] Hasminskii R., Ibragimov I.: On density estimation in the view of Kolmogorov’s ideas in approximation theory.Ann. of Statist. 18 (1990), 999–1010 Zbl 0705.62039, MR 1062695, 10.1214/aos/1176347736 |
Reference:
|
[12] Hernández–Lerma O.: Adaptive Markov Control Processes.Springer–Verlag, New York 1989 Zbl 0698.90053, MR 0995463 |
Reference:
|
[13] Hernández–Lerma O.: Infinite–horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality.Reporte Interno 165. Departamento de Matemáticas, CINVESTAV-IPN, A.P. 14-740.07000, México, D. F., México (1994). (Submitted for publication) |
Reference:
|
[14] Hernández–Lerma O., Cavazos–Cadena R.: Density estimation and adaptive control of Markov processes: average and discounted criteria.Acta Appl. Math. 20 (1990), 285–307 Zbl 0717.93066, MR 1081591, 10.1007/BF00049572 |
Reference:
|
[15] Hernández–Lerma O., Lasserre J. B.: Discrete–Time Markov Control Processes.Springer–Verlag, New York 1995 Zbl 0928.93002 |
Reference:
|
[16] Hernández–Lerma O., Marcus S. I.: Adaptive control of discounted Markov decision chains.J. Optim. Theory Appl. 46 (1985), 227–235 Zbl 0543.90093, MR 0794250, 10.1007/BF00938426 |
Reference:
|
[17] Hernández–Lerma O., Marcus S. I.: Adaptive policies for discrete–time stochastic control system with unknown disturbance distribution.Systems Control Lett. 9 (1987), 307–315 MR 0912683, 10.1016/0167-6911(87)90055-7 |
Reference:
|
[18] Hinderer K.: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter.(Lecture Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 Zbl 0202.18401, MR 0267890 |
Reference:
|
[19] Köthe G.: Topological Vector Spaces I.Springer–Verlag, New York 1969 MR 0248498 |
Reference:
|
[20] Kumar P. R., Varaiya P.: Stochastic Systems: Estimation, Identification and Adaptive Control.Prentice–Hall, Englewood Cliffs 1986 Zbl 0706.93057 |
Reference:
|
[21] Lippman S. A.: On dynamic programming with unbounded rewards.Management Sci. 21 (1975), 1225–1233 Zbl 0309.90017, MR 0398535, 10.1287/mnsc.21.11.1225 |
Reference:
|
[22] Mandl P.: Estimation and control in Markov chains.Adv. in Appl. Probab. 6 (1974), 40–60 Zbl 0281.60070, MR 0339876, 10.2307/1426206 |
Reference:
|
[23] Rieder U.: Measurable selection theorems for optimization problems.Manuscripta Math. 24 (1978), 115–131 Zbl 0385.28005, MR 0493590, 10.1007/BF01168566 |
Reference:
|
[24] Schäl M.: Estimation and control in discounted stochastic dynamic programming.Stochastics 20 (1987), 51–71 MR 0875814, 10.1080/17442508708833435 |
Reference:
|
[25] Stettner L.: On nearly self-optimizing strategies for a discrete–time uniformly ergodic adaptive model.J. Appl. Math. Optim. 27 (1993), 161–177 Zbl 0769.93084, MR 1202530, 10.1007/BF01195980 |
Reference:
|
[26] Stettner L.: Ergodic control of Markov process with mixed observation structure.Dissertationes Math. 341 (1995), 1–36 MR 1318335 |
Reference:
|
[27] Nunen J. A. E. E. van, Wessels J.: A note on dynamic programming with unbounded rewards.Management Sci. 24 (1978), 576–580 10.1287/mnsc.24.5.576 |
. |