Previous |  Up |  Next

Article

Title: Approximate dynamic programming based on high dimensional model representation (English)
Author: Pištěk, Miroslav
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 49
Issue: 5
Year: 2013
Pages: 720-737
Summary lang: English
.
Category: math
.
Summary: This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications. (English)
Keyword: approximate dynamic programming
Keyword: Bellman equation
Keyword: approximate HDMR minimization
Keyword: trust region problem
MSC: 90C39
idZBL: Zbl 1278.90423
idMR: MR3182636
.
Date available: 2013-11-27T09:47:20Z
Last updated: 2015-03-29
Stable URL: http://hdl.handle.net/10338.dmlcz/143521
.
Reference: [1] Busygin, S.: A new trust region technique for the maximum weight clique problem..Discrete Appl. Math. 154 (2002), 2006. Zbl 1111.90020, MR 2257814
Reference: [2] Demiralp, M.: High dimensional model representation and its application varieties..In: Proc. Fourth International Conference on Tools for Mathematical Modelling, St. Petersburg 2003, pp. 146-159. Zbl 1237.93093
Reference: [3] Feil, K. S., Shah, N.: Volatility calibration using spline and high dimensional model representation models..Wilmott J. 1 (2009), 179-195. 10.1002/wilj.18
Reference: [4] Garivier, A., Cappé, O.: The KL-UCB algorithm for bounded stochastic bandits and beyond..In: Proc. 24th Annual Conference on Learning Theory, Budapest 2011.
Reference: [5] George, A., Powell, W. B., Kulkarni, S. R.: Value function approximation using multiple aggregation for multiattribute resource management..J. Machine Learning Res. 9 (2008), 2079-2111. Zbl 1225.68180
Reference: [6] Gittins, J.: Bandit processes and dynamic allocation indices..J. Roy. Statist. Soc. Ser.B 41 (1979), 2, 148-177. Zbl 0411.62055, MR 0547241
Reference: [7] Gosavi, A.: Reinforcement learning: A tutorial survey and recent advances..INFORMS J. Comput. 21 (2009), 178-192. Zbl 1243.68240, MR 2549123, 10.1287/ijoc.1080.0305
Reference: [8] Hauskrecht, M.: Value-function approximations for partially observable markov decision processes..J. Artif. Internat. Res. 13 (2000), 33-94. Zbl 0946.68131, MR 1781862
Reference: [9] Jaakkola, T., Singh, S. P., Jordan, M. I.: Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems..MIT Press, 1995.
Reference: [10] Karp, R. M.: Reducibility among combinatorial problems..In: Complexity of Computer Computations (R. Miller and J. W. Thatcher, eds.), Plenum, New York 1972. Zbl 1187.90014, MR 0378476, 10.1007/978-1-4684-2001-2_9
Reference: [11] Kushner, H.: Introduction to Stochastic Control..Holt, Rinehart and Winston, New York 1970. Zbl 0293.93018, MR 0280248
Reference: [12] LeBlanc, M., Tibshirani, R.: Combining estimates in regression and classification..J. Amer. Statist. Assoc. 91 (1996), 1641-1650. Zbl 0881.62046, MR 1439105
Reference: [13] Li, G., Hu, J., Wang, S.-W., Georgopoulos, P. G., Schoendorf, J., Rabitz, H.: Random sampling-high dimensional model representation (rs-hdmr) and orthogonality of its different order component functions..J. Phys. Chem. A 110 (2006), 7, 2474-2485. 10.1021/jp054148m
Reference: [14] Luus, R.: Iterative Dynamic Programming..Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics, 2000. Zbl 1158.49038, MR 1750212
Reference: [15] Ma, X., Zabaras, N.: An adaptive high-dimensional stochastic model representation technique for the solution of stochastic partial differential equations..J. Comput. Phys. 229 (2010), 3884-3915. Zbl 1189.65019, MR 2609758, 10.1016/j.jcp.2010.01.033
Reference: [16] Matoušek, J., Nešetřil, J.: Invitation to Discrete Mathematics..Clarendon Press, 1998. Zbl 1152.05002, MR 1668997
Reference: [17] Miller, W., Sutton, R., Werbos, P.: Neural Networks for Control. Neural Network Modeling and Connectionism..Mit Press, 1995.
Reference: [18] Olsson, C., Eriksson, A., Kahl, F.: Solving large scale binary quadratic problems: Spectral methods vs. semidefinite programming..In: Computer Vision and Pattern Recognition, 2007.
Reference: [19] Peterka, V.: Bayesian system identification..In: Trends and Progress in System Identification (P. Eykhoff, ed.), Pergamon Press, Oxford 1981, pp. 239-304. Zbl 0451.93059, MR 0746139
Reference: [20] Pištěk, M.: On implicit approximation of the bellman equation..In: 15th IFAC Symposium on System Identification, Saint-Malo 2009.
Reference: [21] Powell, W. B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality..Wiley-Interscience, 2007. Zbl 1242.90002, MR 2347698
Reference: [22] Rabitz, H., Alis, O.: General foundations of high-dimensional model representations..J. Math. Chem. 25 (1999), 197-233. Zbl 0957.93004, MR 1731292, 10.1023/A:1019188517934
Reference: [23] Rahman, S.: A polynomial dimensional decomposition for stochastic computing..Internat. J. Numer. Methods in Engrg. 76 (2008), 2091-2116. Zbl 1195.74310, MR 2475282, 10.1002/nme.2394
Reference: [24] Rojas, M., Santos, S. A., Sorensen, D. C.: A new matrix-free algorithm for the large-scale trust-region subproblem..SIAM J. Optim. 11 (2000), 611-646. Zbl 0994.65067, MR 1814035, 10.1137/S105262349928887X
Reference: [25] Schrijver, A.: Theory of Linear and Integer Programming..Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 1998. Zbl 0970.90052, MR 0874114
Reference: [26] Sorensen, D. C.: Newton's method with a model trust region modification..SIAM J. Numer. Anal. 19 (1982), 2, 409-426. Zbl 0483.65039, MR 0650060, 10.1137/0719026
Reference: [27] Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction..MIT Press, 1998.
.

Files

Files Size Format View
Kybernetika_49-2013-5_4.pdf 362.5Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo