Bi-personal stochastic transient Markov games with stopping times and total reward criterion

Martínez-Cortés, Victor Manuel

About DML-CZ | FAQ | Conditions of Use | Math Archives | Contact Us

Previous | Up | Next

Article

Title:	Bi-personal stochastic transient Markov games with stopping times and total reward criterion (English)
Author:	Martínez-Cortés, Victor Manuel
Language:	English
Journal:	Kybernetika
ISSN:	0023-5954 (print)
ISSN:	1805-949X (online)
Volume:	57
Issue:	1
Year:	2021
Pages:	1-14
Summary lang:	English
.
Category:	math
.
Summary:	The article is devoted to a class of Bi-personal (players 1 and 2), zero-sum Markov games evolving in discrete-time on Transient Markov reward chains. At each decision time the second player can stop the system by paying terminal reward to the first player. If the system is not stopped the first player selects a decision and two things will happen: The Markov chain reaches next state according to the known transition law, and the second player must pay a reward to the first player. The first player (resp. the second player) tries to maximize (resp. minimize) his total expected reward (resp. cost). Observe that if the second player is dummy, the problem is reduced to finding optimal policy of a transient Markov reward chain. Contraction properties of the transient model enable to apply the Banach Fixed Point Theorem and establish the Nash Equilibrium. The obtained results are illustrated on two numerical examples. (English)
Keyword:	two-person Markov games
Keyword:	stopping times
Keyword:	stopping times in transient Markov decision chains
Keyword:	transient and communicating Markov chains
MSC:	91A05
MSC:	91A50
idZBL:	Zbl 07396252
idMR:	MR4231853
DOI:	10.14736/kyb-2021-1-0001
.
Date available:	2021-07-30T12:43:43Z
Last updated:	2021-11-01
Stable URL:	http://hdl.handle.net/10338.dmlcz/149021
.
Reference:	[1] Ash, E.: Real Analysis and Probability..Academic Press, 1972. MR 0435320
Reference:	[2] Cavazos-Cadena, R., Hernández-Hernández, D.: Nash equilibria in a class of Markov stopping games..Kybernetika 48 (2012), 1027-1044. MR 3086867
Reference:	[3] Cavazos-Cadena, R., Montes-de-Oca, R.: Nearly optimal policies in risk-sensitive positive dynamic programming on discrete spaces..Math. Methods Oper. Res. 27 (2000), 137-167. MR 1782381,
Reference:	[4] Filar, J. A., Vrieze, O. J.: Competitive Markov Decision Processes..Springer Verlag, Berlin 1996. MR 1418636,
Reference:	[5] Granas, A., Dugundji, J.: Fixed Point Theory..Springer-Verlag, New York 2003. MR 1987179
Reference:	[6] Hinderer, K.: Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter..Springer-Verlag, Berlin 1970. MR 0267890,
Reference:	[7] Howard, R. A., Matheson, J.: Risk-sensitive Markov decision processes..Management Sci. 23 (1972), 356-369. MR 0292497,
Reference:	[8] Kolokoltsov, V. N., Malafayev, O. A.: Understanding Game Theory..World Scientific, Singapore 2010. MR 2666863,
Reference:	[9] Nash, J.: Equilibrium points in n-person games..Proc. National Acad. Sci. United States of America 36 (1950), 48-49. MR 0031701,
Reference:	[10] Puterman, M. L.: Markov Decision Processes - Discrete Stochastic Dynamic Programming..Wiley, New York 1994. MR 1270015,
Reference:	[11] Raghavan, T. E. S., Tijs, S. H., J., O., Vrieze: On stochastic games with additive reward and transition structure..J. Optim. Theory Appl. 47 (1985), 451-464. MR 0818872,
Reference:	[12] Ross, S.: Introduction to Probability Models. Ninth edition..Elsevier 2007. MR 1247962
Reference:	[13] Shapley, L. S.: Stochastic games..Proc. National Academy Sciences of United States of America 39 (1953), 1095-1100. Zbl 1180.91042, MR 0061807,
Reference:	[14] Shiryaev, A.: Optimal Stopping Rules..Springer, New York 1978. Zbl 1138.60008, MR 0468067
Reference:	[15] Sladký, K., Martínez-Cortés, V. M.: Risk-sensitive optimality in Markov games..In: Proc. 35th International Conference Mathematical Methods in Economics 2017 (P. Pražák, ed.). Univ. Hradec Králové 2017, pp. 684-689.
Reference:	[16] Thomas, L. C.: Connectedness conditions used in finite state Markov decision processes..J. Math. Anal. Appl. 68 (1979), 548-556. MR 0533512,
Reference:	[17] Thomas, L. C.: Connectedness conditions for denumerable state Markov decision processes..In: Recent Developments in Markov Decision Processes (R. Hartley, L.\|,C. Thomas and D. J. White, eds.), Academic Press, New York 1980, pp. 181-204. MR 0611528
Reference:	[18] Thuijsman, F.: Optimality and Equilibria in Stochastic Games..Mathematical Centre Tracts, Amsterdam 1992. MR 1171220
Reference:	[19] Wal, J. Van der: Discounted Markov games: successive approximations and stopping times..Int. J. Game Theory 6 (1977), 11-22. MR 0456797,
Reference:	[20] Wal, J. Van der: Stochastic Dynamic Programming..Mathematical Centre Tracts, Amsterdam 1981. MR 0633156
Reference:	[21] Vrieze, O. J.: Stochastic Games with Finite State and Action Spaces..Mathematical Centre Tracts, Amsterdam 1987. MR 0886482
Reference:	[22] Zachrisson, L.: Markov games..In: Advances in Game Theory (M. Dresher, L. S. Shapley and A. W. Tucker, eds.), Princeston University Press 1964. Zbl 0126.36507, MR 0170729,
Reference:	[23] Zijm, W. H. M.: Nonnegative Matrices in Dynamic Programming..Mathematisch Centrum, Amsterdam 1983. MR 0723868
.

Files

Files	Size	Format	View
Kybernetika_57-2021-1_1.pdf	385.6Kb	application/pdf	View/Open

Back to standard record

Browse
- Collections
- Titles
- Authors
- MSC

About DML-CZ

Partner of

Article

Files

Search

Browse