Previous |  Up |  Next

Article

Keywords:
approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem
Summary:
This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications.
References:
[1] Busygin, S.: A new trust region technique for the maximum weight clique problem. Discrete Appl. Math. 154 (2002), 2006. MR 2257814 | Zbl 1111.90020
[2] Demiralp, M.: High dimensional model representation and its application varieties. In: Proc. Fourth International Conference on Tools for Mathematical Modelling, St. Petersburg 2003, pp. 146-159. Zbl 1237.93093
[3] Feil, K. S., Shah, N.: Volatility calibration using spline and high dimensional model representation models. Wilmott J. 1 (2009), 179-195. DOI 10.1002/wilj.18
[4] Garivier, A., Cappé, O.: The KL-UCB algorithm for bounded stochastic bandits and beyond. In: Proc. 24th Annual Conference on Learning Theory, Budapest 2011.
[5] George, A., Powell, W. B., Kulkarni, S. R.: Value function approximation using multiple aggregation for multiattribute resource management. J. Machine Learning Res. 9 (2008), 2079-2111. Zbl 1225.68180
[6] Gittins, J.: Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser.B 41 (1979), 2, 148-177. MR 0547241 | Zbl 0411.62055
[7] Gosavi, A.: Reinforcement learning: A tutorial survey and recent advances. INFORMS J. Comput. 21 (2009), 178-192. DOI 10.1287/ijoc.1080.0305 | MR 2549123 | Zbl 1243.68240
[8] Hauskrecht, M.: Value-function approximations for partially observable markov decision processes. J. Artif. Internat. Res. 13 (2000), 33-94. MR 1781862 | Zbl 0946.68131
[9] Jaakkola, T., Singh, S. P., Jordan, M. I.: Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. MIT Press, 1995.
[10] Karp, R. M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations (R. Miller and J. W. Thatcher, eds.), Plenum, New York 1972. DOI 10.1007/978-1-4684-2001-2_9 | MR 0378476 | Zbl 1187.90014
[11] Kushner, H.: Introduction to Stochastic Control. Holt, Rinehart and Winston, New York 1970. MR 0280248 | Zbl 0293.93018
[12] LeBlanc, M., Tibshirani, R.: Combining estimates in regression and classification. J. Amer. Statist. Assoc. 91 (1996), 1641-1650. MR 1439105 | Zbl 0881.62046
[13] Li, G., Hu, J., Wang, S.-W., Georgopoulos, P. G., Schoendorf, J., Rabitz, H.: Random sampling-high dimensional model representation (rs-hdmr) and orthogonality of its different order component functions. J. Phys. Chem. A 110 (2006), 7, 2474-2485. DOI 10.1021/jp054148m
[14] Luus, R.: Iterative Dynamic Programming. Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics, 2000. MR 1750212 | Zbl 1158.49038
[15] Ma, X., Zabaras, N.: An adaptive high-dimensional stochastic model representation technique for the solution of stochastic partial differential equations. J. Comput. Phys. 229 (2010), 3884-3915. DOI 10.1016/j.jcp.2010.01.033 | MR 2609758 | Zbl 1189.65019
[16] Matoušek, J., Nešetřil, J.: Invitation to Discrete Mathematics. Clarendon Press, 1998. MR 1668997 | Zbl 1152.05002
[17] Miller, W., Sutton, R., Werbos, P.: Neural Networks for Control. Neural Network Modeling and Connectionism. Mit Press, 1995.
[18] Olsson, C., Eriksson, A., Kahl, F.: Solving large scale binary quadratic problems: Spectral methods vs. semidefinite programming. In: Computer Vision and Pattern Recognition, 2007.
[19] Peterka, V.: Bayesian system identification. In: Trends and Progress in System Identification (P. Eykhoff, ed.), Pergamon Press, Oxford 1981, pp. 239-304. MR 0746139 | Zbl 0451.93059
[20] Pištěk, M.: On implicit approximation of the bellman equation. In: 15th IFAC Symposium on System Identification, Saint-Malo 2009.
[21] Powell, W. B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley-Interscience, 2007. MR 2347698 | Zbl 1242.90002
[22] Rabitz, H., Alis, O.: General foundations of high-dimensional model representations. J. Math. Chem. 25 (1999), 197-233. DOI 10.1023/A:1019188517934 | MR 1731292 | Zbl 0957.93004
[23] Rahman, S.: A polynomial dimensional decomposition for stochastic computing. Internat. J. Numer. Methods in Engrg. 76 (2008), 2091-2116. DOI 10.1002/nme.2394 | MR 2475282 | Zbl 1195.74310
[24] Rojas, M., Santos, S. A., Sorensen, D. C.: A new matrix-free algorithm for the large-scale trust-region subproblem. SIAM J. Optim. 11 (2000), 611-646. DOI 10.1137/S105262349928887X | MR 1814035 | Zbl 0994.65067
[25] Schrijver, A.: Theory of Linear and Integer Programming. Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 1998. MR 0874114 | Zbl 0970.90052
[26] Sorensen, D. C.: Newton's method with a model trust region modification. SIAM J. Numer. Anal. 19 (1982), 2, 409-426. DOI 10.1137/0719026 | MR 0650060 | Zbl 0483.65039
[27] Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, 1998.
Partner of
EuDML logo