Previous |  Up |  Next

# Article

Full entry | PDF   (0.6 MB)
Keywords:
backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree
Summary:
Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal{H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal{H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal{H}}$ to $p$ by the information divergence $D(p: p_{\mathcal{H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal{H}}$ of tree-width $k$ such that $D(p: p_{\mathcal{H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a good' solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.
References:
[1] Almond, R., Kong, A.: Optimality Issues in Constructing a Markov Tree from Graphical Models. Research Report A-3, Dept. Statistics, Harvard University, 1991.
[2] Altmüller, A., Haralick, R. M.: Approximating high dimensional probability disributions. In: Proc. XVII Int. Conf. on Patter Recognitions, 2004.
[3] Altmüller, A., Haralick, R. M.: Practical aspects of efficient forward selection in decomposable graphical models. In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715.
[4] Bach, F. R., Jordan, M. I.: Thin junction trees. Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572.
[5] Badsberg, J.-H., Malvestuto, F. M.: An implementation of the iterative proportional fitting procedure by propagation trees. Comput. Statist. Data Anal. 37 (2001), 297-322. DOI 10.1016/S0167-9473(01)00013-5 | MR 1856676 | Zbl 1061.65500
[6] Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes. J. ACM 30 (1983), 479-513. DOI 10.1145/2402.322389 | MR 0709830 | Zbl 0624.68087
[7] Beineke, L. W., Pippert, R. E.: The enumeration of labelled 2-trees. J. Combin. Theory 6 (1969), 200-205. MR 0234868
[8] Bishop, Y., Fienberg, S. E., Holland, P. W.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge 1975. MR 0381130 | Zbl 1131.62043
[9] Brown, D. T.: A note on approximations to discrete probability distributions. Inform. and Control 2 (1959), 386-392. DOI 10.1016/S0019-9958(59)80016-4 | MR 0110598 | Zbl 0117.14804
[10] Chickering, D.: Learning Bayesian networks is NP-complete. In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. MR 1473013
[11] Chow, C. K., Liu, C. N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 (1968), 462-467. DOI 10.1109/TIT.1968.1054142 | Zbl 0165.22305
[12] Cover, T. M.: Elements of Information Theory. John Wiley and Sons, 1991. MR 1122806 | Zbl 1140.94001
[13] Csiszár, I., Körner, J.: Information Theory. Academic Press, 1981. MR 0666545
[14] Dagum, P., Luby, M.: Approximating probabilistic inference in belief networks is NP-hard. Artificial Intelligence 60 (1993), 141-153. DOI 10.1016/0004-3702(93)90036-B | MR 1216898
[15] Dasgupta, S.: Learning polytrees. In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141.
[16] Deshpande, A., Garofalakis, M., Jordan, M. I.: Efficient stepwise selection in decomposable models. In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135.
[17] Ding, G., Lax, R. F., Chen, J., Chen, P. P., Marx, B. D.: Comparison of greedy strategies for learning Markov networks of treewidth $k$. In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301.
[18] Havránek, T.: On model search methods. In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108.
[19] Havránek, T.: Simple formal systems in data analysis. In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381.
[20] Jensen, F. V., Jensen, F.: Optimal junction trees. In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366.
[21] Karger, D., Srebro, N.: Learning Markov networks: maximum bounded tree-width graphs. In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. MR 1958431 | Zbl 0987.68067
[22] Kloks, T.: Tree-width. LNCS 842, Springer Verlag, Berlin 1994. Zbl 0925.05052
[23] Kocka, T.: New algorithm for learning decomposable models. Unpublished manuscript, 2000.
[24] Kovács, E., Szántai, T.: Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network. arXiv:1105.1697v1, 2011.
[25] Ku, H. H., Kullback, S.: Approximating discrete probability distributions. IEEE Trans. Inform. Theory 15 (1969), 444-447. DOI 10.1109/TIT.1969.1054336 | MR 0243669 | Zbl 0174.23202
[26] Lauritzen, S. L.: Graphical Models. Clarendon Press, Oxford 1996. MR 1419991
[27] II, P. M. Lewis: Approximating probability distributions to reduce storage requirements. Inform. and Control 2 (1959), 214-225. DOI 10.1016/S0019-9958(59)90207-4 | MR 0110597
[28] Malvestuto, F. M.: Operations research in the design of statistical databases (in Italian). In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130.
[29] Malvestuto, F. M.: Approximating discrete probability distributions with decomposable models. IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. DOI 10.1109/21.120082
[30] Malvestuto, F. M.: An axiomatization of loglinear models with an application to the model search. In: Learning from Data, LNS 112 (1996), pp. 175-184.
[31] Malvestuto, F. M.: Designing a probabilistic database from a given set of full conditional independences. In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999.
[32] Malvestuto, F. M.: A hypergraph-theoretic analysis of collapsibility and decomposability for extended log-linear models. Statist. Comput. 11 (2001), 155-169. DOI 10.1023/A:1008979300007 | MR 1837135
[33] Malvestuto, F. M.: From conditional independences to factorization constraints with discrete random variables. Ann. Math. and Artificial Intelligence 35 (2002), 253-285. DOI 10.1023/A:1014551721406 | MR 1899954 | Zbl 1001.68033
[34] Malvestuto, F. M.: Tree and local computations in a cross-entropy minimization problem with marginal constraints. Kybernetika 46 (2010), 621-654. MR 2722092 | Zbl 1204.93113
[35] Meek, C.: Finding a path is harder than finding a tree. J. Artificial Intelligence Res. 15 (2001), 383-389. MR 1884083 | Zbl 0994.68120
[36] Mezzini, M., Moscarini, M.: Simple algorithms for minimal triangulation of a graph and backward selection of a decomposable Markov network. Theoret. Comput. Sci. 411 (2010), 958-966. DOI 10.1016/j.tcs.2009.10.004 | MR 2606034 | Zbl 1213.05251
[37] Nunez, K., Chen, J., Chen, P., Ding, G., Lax, R. F., Marx, B.: Empirical comparison of greedy strategies for learning Markov networks of treewidth $k$. In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113.
[38] Rose, J. D.: On simple characterizations of $k$-trees. Discrete Math. 7 (1974), 317-322. DOI 10.1016/0012-365X(74)90042-9 | MR 0335319 | Zbl 0285.05128
[39] Szántai, T., Kovács, E.: Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution. In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. MR 2874757
[40] Szántai, T., Kovács, E.: Discovering a junction tree behind a Markov network by a greedy algorithm. arXiv:1104.2762v3, 2011.
[41] Tarjan, R. E., Yannakakis, M.: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce hypergraphs. SIAM J. Comput. 13 (1984), 566-579. DOI 10.1137/0213035 | MR 0749707
[42] Wermuth, N.: Analogies between multiplicative models in contingency tables and covariance selection. Biometrics 32 (1976), 95-108. DOI 10.2307/2529341 | MR 0403088 | Zbl 0357.62049
[43] Wermuth, N.: Model search among multiplicative models. Biometrics 32 (1976), 253-256. DOI 10.2307/2529496 | Zbl 0339.62079
[44] Xiang, Y., Wong, S. K. M., Cercone, N.: A `microscopic'' study of minimum entropy search in learning decomposable Markov networks. Mach. Learning 26 (1997), 65-72. DOI 10.1023/A:1007324100110 | Zbl 0866.68088

Partner of