Title:
|
Markov decision processes with time-varying discount factors and random horizon (English) |
Author:
|
Ilhuicatzi-Roldán, Rocio |
Author:
|
Cruz-Suárez, Hugo |
Author:
|
Chávez-Rodríguez, Selene |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 (print) |
ISSN:
|
1805-949X (online) |
Volume:
|
53 |
Issue:
|
1 |
Year:
|
2017 |
Pages:
|
82-98 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
This paper is related to Markov Decision Processes. The optimal control problem is to minimize the expected total discounted cost, with a non-constant discount factor. The discount factor is time-varying and it could depend on the state and the action. Furthermore, it is considered that the horizon of the optimization problem is given by a discrete random variable, that is, a random horizon is assumed. Under general conditions on Markov control model, using the dynamic programming approach, an optimality equation for both cases is obtained, namely, finite support and infinite support of the random horizon. The obtained results are illustrated by two examples, one of them related to optimal replacement. (English) |
Keyword:
|
Markov decision process |
Keyword:
|
dynamic programming |
Keyword:
|
varying discount factor |
Keyword:
|
random horizon |
MSC:
|
90C39 |
MSC:
|
90C40 |
MSC:
|
93E20 |
idZBL:
|
Zbl 06738595 |
idMR:
|
MR3638557 |
DOI:
|
10.14736/kyb-2017-1-0082 |
. |
Date available:
|
2017-04-03T10:48:39Z |
Last updated:
|
2018-01-10 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/146709 |
. |
Reference:
|
[1] Carmon, Y., Shwartz, A.: Markov decision processes with exponentially representable discounting..Oper. Res. Lett. 37 (2009), 51-55. Zbl 1154.90610, MR 2488083, 10.1016/j.orl.2008.10.005 |
Reference:
|
[2] Chen, X., Yang, X.: Optimal consumption and investment problem with random horizon in a BMAP model..Insurance Math. Econom. 61 (2015), 197-205. Zbl 1314.91192, MR 3324056, 10.1016/j.insmatheco.2015.01.004 |
Reference:
|
[3] Cruz-Suárez, H., Ilhuicatzi-Roldán, R., Montes-de-Oca, R.: Markov decision processes on Borel spaces with total cost and random horizon..J. Optim. Theory Appl. 162 (2014), 329-346. Zbl 1317.90316, MR 3228530, 10.1007/s10957-012-0262-8 |
Reference:
|
[4] Vecchia, E. Della, Marco, S. Di, Vidal, F.: Dynamic programming for variable discounted Markov decision problems..In: Jornadas Argentinas de Informática e Investigación O\-pe\-ra\-ti\-va (43JAIIO) XII Simposio Argentino de Investigación Operativa (SIO), Buenos Aires 2014, pp. 50-62. |
Reference:
|
[5] Feinberg, E., Shwartz, A.: Constrained dynamic programming with two discount factors: applications and an algorithm..IEEE Trans. Automat. Control 44 (1999), 628-631. Zbl 0957.90127, MR 1680195, 10.1109/9.751365 |
Reference:
|
[6] Feinberg, E., Shwartz, A.: Markov decision models with weighted discounted criteria..Math. Oper. Res. 19 (1994), 152-168. Zbl 0803.90123, MR 1290017, 10.1287/moor.19.1.152 |
Reference:
|
[7] García, Y. H., González-Hernández, J.: Discrete-time Markov control process with recursive discounted rates..Kybernetika 52 (2016), 403-426. MR 3532514, 10.14736/kyb-2016-3-0403 |
Reference:
|
[8] González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A.: Adaptive policies for stochastic systems under a randomized discounted criterion..Bol. Soc. Mat. Mex. 14 (2008), 149-163. MR 2667162 |
Reference:
|
[9] González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A.: Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion..Kybernetika 45 (2009), 737-754. Zbl 1190.93105, MR 2599109 |
Reference:
|
[10] González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A., Gabriel-Arguelles, J. A.: Constrained Markov control processes with randomized discounted cost criteria: occupation measures and external points..Risk and Decision Analysis 4 (2013), 163-176. |
Reference:
|
[11] González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A., Gabriel-Arguelles, J. A.: Constrained Markov control processes with randomized discounted rate: infinite linear programming approach..Optimal Control Appl. Methods 35 (2014), 575-591. MR 3262763, 10.1002/oca.2089 |
Reference:
|
[12] González-Hernández, J., López-Martínez, R. R., Pérez-Hernández, J. R.: Markov control processes with randomized discounted cost..Math. Methods Oper. Res. 65 (2007), 27-44. Zbl 1126.90075, MR 2302022, 10.1007/s00186-006-0092-2 |
Reference:
|
[13] Guo, X., Hernández-del-Valle, A., Hernández-Lerma, O.: First passage problems for nonstationary discrete-time stochastic control systems..Eur. J. Control 18 (2012), 528-538. Zbl 1291.93328, MR 3086896, 10.3166/ejc.18.528-538 |
Reference:
|
[14] Hernández-Lerma, O., Laserre, J. B.: Discrete-time Markov Control Processes: Basic Optimality Criteria..Springer-Verlag, New York 1996. MR 1363487, 10.1007/978-1-4612-0729-0 |
Reference:
|
[15] Hinderer, K.: Foundations of non-stationary dynamic programming with discrete time parameter..In: Lectures Notes Operations Research (M. Bechmann and H. Künzi, eds.), Springer-Verlag 33, Zürich 1970. Zbl 0202.18401, MR 0267890, 10.1007/978-3-642-46229-0 |
Reference:
|
[16] Ilhuicatzi-Roldán, R., Cruz-Suárez, H.: Optimal replacement in a system of $n$-machines with random horizon..Proyecciones 31 (2012), 219-233. Zbl 1262.90050, MR 2995551, 10.4067/s0716-09172012000300003 |
Reference:
|
[17] Minjares-Sosa, J. A.: Markov Control Models with unknown random state-action-dependent discounted factors..TOP 23 (2015), 743-772. MR 3407674, 10.1007/s11750-015-0360-5 |
Reference:
|
[18] Puterman, M. L.: Markov Decision Process: Discrete Stochastic Dynamic Programming..John Wiley and Sons, New York 1994. MR 1270015 |
Reference:
|
[19] Sch{ä}l, M.: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal..Probab. Theory Related Fields 32 (1975), 179-196. Zbl 0316.90080, MR 0378841, 10.1007/bf00532612 |
Reference:
|
[20] Wei, Q., Guo, X.: Markov decision processes with state-dependent discounted factors and unbounded rewards/costs..Oper. Res. Lett. 39 (2011), 369-374. MR 2835530, 10.1016/j.orl.2011.06.014 |
Reference:
|
[21] Wei, Q., Guo, X.: Semi-Markov decision processes with variance minimization criterion..4OR, 13 (2015), 59-79. Zbl 1310.93087, MR 3323274, 10.1007/s10288-014-0267-2 |
Reference:
|
[22] Wu, X., Guo, X.: First passage optimality and variance minimisation of Markov decision processes with varying discounted factors..J. Appl. Probab. 52 (2015), 441-456. MR 3372085, 10.1017/s0021900200012560 |
Reference:
|
[23] Wu, X., Zou, X., Guo, X.: First passage Markov decision processes with constraints and varying discount factors..Front. Math. China 10 (2015), 1005-1023. Zbl 1317.90319, MR 3352898, 10.1007/s11464-015-0479-6 |
Reference:
|
[24] Wu, X., Zhang, J.: An application to the finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors..In: Proc. 11th World Congress on Intelligent Control and Automation 2015, pp. 1745-1748. MR 3163332, 10.1109/wcica.2014.7052984 |
Reference:
|
[25] Wu, X., Zhang, J.: Finite approximation of the first passage models for discrete-time Markov decision processes with varying discounted factors..Discrete Event Dyn. Syst. 26 (2016), 669-683. MR 3557415, 10.1007/s10626-014-0209-3 |
Reference:
|
[26] Ye, L., Guo, X.: Continuous-time Markov decision processes with state-dependent discount factors..Acta Appl. Math. 121 (2012), 5-27. Zbl 1281.90082, MR 2966962, 10.1007/s10440-012-9669-3 |
Reference:
|
[27] Zhang, Y.: Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors..TOP 21 (2013), 378-408. Zbl 1273.90235, MR 3068494, 10.1007/s11750-011-0186-8 |
. |