Title:
|
Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs (English) |
Author:
|
Escobedo-Trujillo, Beatris A. |
Author:
|
Higuera-Chan, Carmen G. |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 (print) |
ISSN:
|
1805-949X (online) |
Volume:
|
55 |
Issue:
|
1 |
Year:
|
2019 |
Pages:
|
166-182 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal{M}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_{n+1}=G_n(x_n,a_n,\xi_n), n=0,1,\ldots$, with state-action dependent discount factors of the form $\alpha_n(x_n,a_n)$, where $a_n$ and $\xi_n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace\alpha_n\rbrace$,$\lbrace c_n\rbrace$ and $\lbrace G_n\rbrace$ converge, in certain sense, to $\alpha_\infty$, $c_\infty$ and $G_\infty$, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal{M}_\infty$ corresponding to $\alpha_\infty$, $c_\infty$ and $G_\infty$. Finally, we illustrate our results and their applicability in a class of semi-Markov control models. (English) |
Keyword:
|
discounted optimality |
Keyword:
|
non-constant discount factor |
Keyword:
|
time-varying Markov decision processes |
MSC:
|
90C40 |
MSC:
|
93E20 |
idZBL:
|
Zbl 07088884 |
idMR:
|
MR3935420 |
DOI:
|
10.14736/kyb-2019-1-0166 |
. |
Date available:
|
2019-05-07T11:16:34Z |
Last updated:
|
2020-02-27 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/147711 |
. |
Reference:
|
[1] Bastin, G., Dochain, D.: On-line Estimation and Adaptive Control of Bioreactors..Elsevier, Amsterdam 2014. |
Reference:
|
[2] Bertsekas, D. P.: Approximate policy iteration: a survey and some new methods..J. Control Theory Appl. 9 (2011), 310-335. MR 2833999, 10.1007/s11768-011-1005-3 |
Reference:
|
[3] Dynkin, E. B., Yushkevich, A. A.: Controlled Markov Processes..Springer-Verlag, New York 1979. MR 0554083, 10.1007/978-1-4615-6746-2 |
Reference:
|
[4] González-Hernández, J., López-Martínez, R. R., Minjárez-Sosa, J. A.: Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion..Kybernetika 45 (2009), 737-754. MR 2599109 |
Reference:
|
[5] Gordienko, E. I., Minjárez-Sosa, J. A.: Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion..Kybernetika 34 (1998), 217-234. MR 1621512 |
Reference:
|
[6] Hernández-Lerma, O., Lasseerre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria..Springer, New York 1996. MR 1363487, 10.1007/978-1-4612-0729-0 |
Reference:
|
[7] Hernández-Lerma, \noindent O., Lasserre, J. B.: Further Topics on Discrete-time Markov Control Processes..Springer-Verlag, New York 1999. MR 1697198, 10.1007/978-1-4612-0561-6 |
Reference:
|
[8] Hernández-Lerma, O., Hilgert, N.: Limiting optimal discounted-cost control of a class of time-varying stochastic systems..Syst. Control Lett. 40 (2000), 1, 37-42. MR 1829073, 10.1016/s0167-6911(99)00121-8 |
Reference:
|
[9] Hilgert, N., Minjárez-Sosa, J. A.: Adaptive policies for time-varying stochastic systems under discounted criterion..Math. Meth. Oper. Res. 54 (2001), 3, 491-505. MR 1890916, 10.1007/s001860100170 |
Reference:
|
[10] Hilgert, N., Minjárez-Sosa, J. A.: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria..Math. Meth. Oper. Res. 63 (2006), 443-460. MR 2264761, 10.1007/s00186-005-0024-6 |
Reference:
|
[11] Hilgert, N., Senoussi, R., Vila, J. P.: Nonparametric estimation of time-varying autoregressive nonlinear processes..C. R. Acad. Sci. Paris Série 1 1996), 232, 1085-1090. MR 1423225, 10.1109/.2001.980647 |
Reference:
|
[12] Lewis, M. E., Paul, A.: Uniform turnpike theorems for finite Markov decision processes..Math. Oper. Res. |
Reference:
|
[13] Luque-Vásquez, F., Minjárez-Sosa, J. A.: Semi-Markov control processes with unknown holding times distribution under a discounted criterion..Math. Meth. Oper. Res. 61 (2005), 455-468. MR 2225824, 10.1007/s001860400406 |
Reference:
|
[14] Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C.: Semi-Markov control processes with partially known holding times distribution: Discounted and average criteria..Acta Appl. Math. 114 (2011), 3, 135-156. MR 2794078, 10.1007/s10440-011-9605-y |
Reference:
|
[15] Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C.: Semi-Markov control processes with unknown holding times distribution under an average criterion cost..Appl. Math. Optim. Theory Appl. 61 (2010), 3, 317-336. MR 2609593, 10.1007/s00245-009-9086-9 |
Reference:
|
[16] Minjárez-Sosa, J. A.: Markov control models with unknown random state-action-dependent discount factors..TOP 23 (2015), 743-772. MR 3407674, 10.1007/s11750-015-0360-5 |
Reference:
|
[17] Minjárez-Sosa, J. A.: Approximation and estimation in Markov control processes under discounted criterion..Kybernetika 40 (2004), 6, 681-690. MR 2120390 |
Reference:
|
[18] Powell, W. B.: Approximate Dynamic Programming. Solving the Curse of Dimensionality.John Wiley and Sons Inc, 2007. MR 2839330, 10.1002/9780470182963 |
Reference:
|
[19] Puterman, M. L.: Markov Decision Processes. Discrete Stochastic Dynamic Programming..John Wiley and Sons 1994. MR 1270015, 10.1002/9780470316887 |
Reference:
|
[20] Rieder, U.: Measurable selection theorems for optimization problems..Manuscripta Math. 24 (1978), 115-131. Zbl 0385.28005, MR 0493590, 10.1007/bf01168566 |
Reference:
|
[21] Robles-Alcaráz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. A.: Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces..Risk Decision Analysis 6 (2017), 2, 79-95. 10.3233/rda-160116 |
Reference:
|
[22] Royden, H. L.: Real Analysis..Prentice Hall 1968. Zbl 1191.26002, MR 0928805 |
Reference:
|
[23] Schäl, M.: Conditions for optimality and for the limit on n-stage optimal policies to be optimal..Z. Wahrs. Verw. Gerb. 32 (1975), 179-196. MR 0378841, 10.1007/bf00532612 |
Reference:
|
[24] Shapiro, J. F.: Turnpike planning horizon for a markovian decision model..Magnament Sci. 14 (1968), 292-300. 10.1287/mnsc.14.5.292 |
. |