Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains

Montes-de-Oca, Raúl; Salem-Silva, Francisco

About DML-CZ | FAQ | Conditions of Use | Math Archives | Contact Us

Previous | Up | Next

Article

Title:	Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains (English)
Author:	Montes-de-Oca, Raúl
Author:	Salem-Silva, Francisco
Language:	English
Journal:	Kybernetika
ISSN:	0023-5954
Volume:	41
Issue:	6
Year:	2005
Pages:	[757]-772
Summary lang:	English
.
Category:	math
.
Summary:	This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models ${\mathbb{P}}$ and ${\mathbb{P}}_{1}$. $\mathbb{P}$ and ${\mathbb{P}}_{1}$ have the same components except for the transition laws. The transition $q$ of $\mathbb{P}$ is taken as unknown, and the transition $q_{1}$ of ${\mathbb{P}}_{1}$, as a known approximation of $q$. Under certain irreducibility, recurrence and ergodic conditions imposed on the bounding SO Markov chain (these conditions give the rate of convergence of the transition probability in $t$-steps, $t=1,2,\ldots $ to the invariant measure), the difference between the optimal cost to drive $\mathbb{P}$ and the cost obtained to drive $\mathbb{P}$ using the optimal policy of ${\mathbb{P}}_{1}$ is estimated. That difference is defined as the index of perturbations, and in this work upper bounds of it are provided. An example to illustrate the theory developed here is added. (English)
Keyword:	stochastically ordered Markov chains
Keyword:	Lyapunov condition
Keyword:	invariant probability
Keyword:	average Markov decision processes
MSC:	90C40
MSC:	93E20
idZBL:	Zbl 1249.90313
idMR:	MR2193864
.
Date available:	2009-09-24T20:13:02Z
Last updated:	2015-03-23
Stable URL:	http://hdl.handle.net/10338.dmlcz/135691
.
Reference:	[1] Favero F., Runglandier W. J.: A robustness result for stochastic control.Systems Control Lett. 46 (2002), 91–97 MR 2010062, 10.1016/S0167-6911(02)00121-4
Reference:	[2] Gordienko E. I.: An estimate of the stability of optimal control of certain stochastic and deterministic systems.J. Soviet Math. 50 (1992), 891–899 MR 1163393, 10.1007/BF01099115
Reference:	[3] Gordienko E. I.: Lecture Notes on Stability Estimation in Markov Decision Processes.Universidad Autónoma Metropolitana, México D.F., 1994
Reference:	[4] Gordienko E. I., Hernández-Lerma O.: Average cost Markov control processes with weighted norms: value iteration.Appl. Math. 23 (1995), 219–237 Zbl 0829.93068, MR 1341224
Reference:	[5] Gordienko E. I., Salem-Silva F. S.: Robustness inequality for Markov control processes with unbounded costs.Systems Control Lett. 33 (1998), 125–130 MR 1607814, 10.1016/S0167-6911(97)00077-7
Reference:	[6] Gordienko E. I., Salem-Silva F. S.: Estimates of stability of Markov control processes with unbounded costs.Kybernetika 36 (2000), 2, 195–210 MR 1760024
Reference:	[7] Hernández-Lerma O.: Adaptive Markov Control Processes.Springer–Verlag, New York 1989 MR 0995463
Reference:	[8] Hernández-Lerma O., Lasserre J. B.: Further Topics on Discrete-Time Markov Control Processes.Springer–Verlag, New York 1999 Zbl 0928.93002, MR 1697198
Reference:	[9] Hinderer K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter.(Lectures Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 Zbl 0202.18401, MR 0267890
Reference:	[10] Lindvall T.: Lectures on the Coupling Method.(Wiley Series in Probability and Mathematical Statistics.) Wiley, New York 1992 Zbl 1013.60001, MR 1180522
Reference:	[11] Lund R.: The geometric convergence rates of a Lindley random walk.J. Appl. Probab. 34 (1997), 806–811 MR 1464616, 10.2307/3215107
Reference:	[12] Lund R., Tweedie R.: Geometric convergence rates for stochastically ordered Markov chains.Math. Oper. Res. 20 (1996), 182–194 Zbl 0847.60053, MR 1385873, 10.1287/moor.21.1.182
Reference:	[13] Meyn S., Tweedie R.: Markov Chains and Stochastic Stability.Springer–Verlag, New York 1993 Zbl 1165.60001, MR 1287609
Reference:	[14] Montes-de-Oca R., Sakhanenko, A., Salem-Silva F.: Estimates for perturbations of general discounted Markov control chains.Appl. Math. 30 (2003), 3, 287–304 Zbl 1055.90086, MR 2029538
Reference:	[15] Nummelin E.: General Irreducible Markov Chains and Non-negative Operators.Cambrigde University Press, Cambridge 1984 Zbl 0551.60066, MR 0776608
Reference:	[16] Rachev S. T.: Probability Metrics and the Stability of Stochastic Models.Wiley, New York 1991 Zbl 0744.60004, MR 1105086
Reference:	[17] Zolotarev V. M.: On stochastic continuity of queueing systems of type G/G/1.Theory Probab. Appl. 21 (1976), 250–269 Zbl 0363.60090, MR 0420920
.

Files

Files	Size	Format	View
Kybernetika_41-2005-6_6.pdf	1.791Mb	application/pdf	View/Open

Back to standard record

Browse
- Collections
- Titles
- Authors
- MSC

About DML-CZ

Partner of

Article

Files

Search

Browse