Previous |  Up |  Next

Article

Title: Empirical approximation in Markov games under unbounded payoff: discounted and average criteria (English)
Author: Luque-Vásquez, Fernando
Author: Minjárez-Sosa, J. Adolfo
Language: English
Journal: Kybernetika
ISSN: 0023-5954 (print)
ISSN: 1805-949X (online)
Volume: 53
Issue: 4
Year: 2017
Pages: 694-716
Summary lang: English
.
Category: math
.
Summary: This work deals with a class of discrete-time zero-sum Markov games whose state process $\left\{ x_{t}\right\} $ evolves according to the equation $ x_{t+1}=F(x_{t},a_{t},b_{t},\xi _{t}),$ where $a_{t}$ and $b_{t}$ represent the actions of player 1 and 2, respectively, and $\left\{ \xi _{t}\right\} $ is a sequence of independent and identically distributed random variables with unknown distribution $\theta$. Assuming possibly unbounded payoff, and using the empirical distribution to estimate $\theta$, we introduce approximation schemes for the value of the game as well as for optimal strategies considering both, discounted and average criteria. (English)
Keyword: Markov games
Keyword: empirical estimation
Keyword: discounted and average criteria
MSC: 62G07
MSC: 91A15
idZBL: Zbl 06819631
idMR: MR3730259
DOI: 10.14736/kyb-2017-4-0694
.
Date available: 2017-11-12T10:02:26Z
Last updated: 2018-05-25
Stable URL: http://hdl.handle.net/10338.dmlcz/146951
.
Reference: [1] Chang, H. S.: Perfect information two-person zero-sum Markov games with imprecise transition probabilities..Math. Meth. Oper. Res. 64 (2006), 235-351. MR 2264789, 10.1007/s00186-006-0081-5
Reference: [2] Dudley, R. M.: The speed of mean Glivenko-Cantelli convergence..Ann. Math. Stat. 40 (1969), 40-50. MR 0236977, 10.1214/aoms/1177697802
Reference: [3] Dynkin, E. B., Yushkevich, A. A.: Controlled Markov Processes..Springer-Verlag, New York 1979. MR 0554083, 10.1007/978-1-4615-6746-2
Reference: [4] Fernández-Gaucherand, E.: A note on the Ross-Taylor Theorem..Appl. Math. Comp. 64 (1994), 207-212. MR 1298262, 10.1016/0096-3003(94)90064-7
Reference: [5] Filar, J., Vrieze, K.: Competitive Markov Decision Processes..Springer-Verlag, New York 1997. MR 1418636, 10.1007/978-1-4612-4054-9
Reference: [6] Ghosh, M. K., McDonald, D., Sinha, S.: Zero-sum stochastic games with partial information..J. Optim. Theory Appl. 121 (2004), 99-118. MR 2062972, 10.1023/b:jota.0000026133.56615.cf
Reference: [7] Gordienko, E. I.: Adaptive strategies for certain classes of controlled Markov processes..Theory Probab. Appl. 29 (1985), 504-518. MR 0761133, 10.1137/1129064
Reference: [8] Gordienko, E. I., Hernández-Lerma, O.: Average cost Markov control processes with weighted norms: existence of canonical policies..Appl. Math. 23 (1995), 199-218. Zbl 0829.93067, MR 1341223
Reference: [9] Gordienko, E. I., Hernández-Lerma, O.: Average cost Markov control processes with weighted norms: value iteration..Appl. Math. 23 (1995), 219-237. MR 1341224
Reference: [10] Hernández-Lerma, O., Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria..Springer-Verlag, New York 1996. Zbl 0840.93001, MR 1363487, 10.1007/978-1-4612-0729-0
Reference: [11] Hilgert, N., Minjárez-Sosa, J. A.: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criterion..Math. Meth. Oper. Res. 63 (2006), 443-460. MR 2264761, 10.1007/s00186-005-0024-6
Reference: [12] Jaśkiewicz, A., Nowak, A.: Zero-sum ergodic stochastic games with Feller transition probabilities..SIAM J. Control Optim. 45 (2006), 773-789. MR 2247715, 10.1137/s0363012904443257
Reference: [13] Jaśkiewicz, A., Nowak, A.: Approximation of noncooperative semi-Markov games..J. Optim. Theory Appl. 131 (2006), 115-134. MR 2278300, 10.1007/s10957-006-9128-2
Reference: [14] Krausz, A., Rieder, U.: Markov games with incomplete information..Math. Meth. Oper. Res. 46 (1997), 263-279. MR 1481935, 10.1007/bf01217695
Reference: [15] Minjárez-Sosa, J. A.: Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion..Appl. Math. (Warsaw) 26 (1999), 267-280. MR 1725752, 10.4064/am-26-3-267-280
Reference: [16] Minjárez-Sosa, J. A., Vega-Amaya, O.: Asymptotically optimal strategies for adaptive zero-sum discounted Markov games..SIAM J. Control Optim. 48 (2009), 1405-1421. MR 2496982, 10.1137/060651458
Reference: [17] Minjárez-Sosa, J. A., Vega-Amaya, O.: Optimal strategies for adaptive zero-sum average Markov games..J. Math. Analysis Appl. 402 (2013), 44-56. MR 3023236, 10.1016/j.jmaa.2012.12.011
Reference: [18] Minjárez-Sosa, J. A., Luque-Vásquez, F.: Two person zero-sum semi-Markov games with unknown holding times distribution on one side: discounted payoff criterion..Appl. Math. Optim. 57 (2008), 289-305. MR 2407314, 10.1007/s00245-007-9016-7
Reference: [19] Neyman, A., Sorin, S.: Stochastic Games and Applications..Kluwer, 2003. MR 2035554, 10.1007/978-94-010-0189-2
Reference: [20] Prieto-Rumeau, T., Lorenzo, J. M.: Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion..TOP 23 (2015), 799-836. MR 3407676, 10.1007/s11750-014-0354-8
Reference: [21] Shimkin, N., Shwartz, A.: Asymptotically efficient adaptive strategies in repeated games. Part I: Certainty equivalence strategies..Math. Oper. Res. 20 (1995), 743-767. MR 1354780, 10.1287/moor.20.3.743
Reference: [22] Shimkin, N., Shwartz, A.: Asymptotically efficient adaptive strategies in repeated games. Part II: Asymptotic optimality..Math. Oper. Res. 21 (1996), 487-512. MR 1397226, 10.1287/moor.21.2.487
Reference: [23] Schäl, M.: Conditions for optimality and for the limit of $n$-stage optimal policies to be optimal..Z. Wahrs. Verw. Gerb. 32 (1975), 179-196. MR 0378841, 10.1007/bf00532612
Reference: [24] Rao, R. Ranga: Relations between weak and uniform convergence of measures with applications..Ann. Math. Statist. 33 (1962), 659-680. MR 0137809, 10.1214/aoms/1177704588
Reference: [25] Nunen, J. A. E. E. Van, Wessels, J.: A note on dynamic programming with unbounded rewards..Manag. Sci. 24 (1978), 576-580. MR 0521666, 10.1287/mnsc.24.5.576
.

Files

Files Size Format View
Kybernetika_53-2017-4_8.pdf 410.4Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo