Previous |  Up |  Next

Article

Title: A backward selection procedure for approximating a discrete probability distribution by decomposable models (English)
Author: Malvestuto, Francesco M.
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 48
Issue: 5
Year: 2012
Pages: 825-844
Summary lang: English
.
Category: math
.
Summary: Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal{H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal{H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal{H}}$ to $p$ by the information divergence $D(p: p_{\mathcal{H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal{H}}$ of tree-width $k$ such that $D(p: p_{\mathcal{H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a `good' solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$. (English)
Keyword: backward selection
Keyword: information divergence
Keyword: decomposable model
Keyword: acyclic hypergraph
Keyword: $k$-hypertree
MSC: 05C65
MSC: 62-09
MSC: 68R10
MSC: 68T05
idMR: MR3086854
.
Date available: 2012-12-17T13:25:32Z
Last updated: 2013-09-24
Stable URL: http://hdl.handle.net/10338.dmlcz/143083
.
Reference: [1] Almond, R., Kong, A.: Optimality Issues in Constructing a Markov Tree from Graphical Models..Research Report A-3, Dept. Statistics, Harvard University, 1991.
Reference: [2] Altmüller, A., Haralick, R. M.: Approximating high dimensional probability disributions..In: Proc. XVII Int. Conf. on Patter Recognitions, 2004.
Reference: [3] Altmüller, A., Haralick, R. M.: Practical aspects of efficient forward selection in decomposable graphical models..In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715.
Reference: [4] Bach, F. R., Jordan, M. I.: Thin junction trees..Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572.
Reference: [5] Badsberg, J.-H., Malvestuto, F. M.: An implementation of the iterative proportional fitting procedure by propagation trees..Comput. Statist. Data Anal. 37 (2001), 297-322. Zbl 1061.65500, MR 1856676, 10.1016/S0167-9473(01)00013-5
Reference: [6] Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes..J. ACM 30 (1983), 479-513. Zbl 0624.68087, MR 0709830, 10.1145/2402.322389
Reference: [7] Beineke, L. W., Pippert, R. E.: The enumeration of labelled 2-trees..J. Combin. Theory 6 (1969), 200-205. MR 0234868
Reference: [8] Bishop, Y., Fienberg, S. E., Holland, P. W.: Discrete Multivariate Analysis: Theory and Practice..MIT Press, Cambridge 1975. Zbl 1131.62043, MR 0381130
Reference: [9] Brown, D. T.: A note on approximations to discrete probability distributions..Inform. and Control 2 (1959), 386-392. Zbl 0117.14804, MR 0110598, 10.1016/S0019-9958(59)80016-4
Reference: [10] Chickering, D.: Learning Bayesian networks is NP-complete..In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. MR 1473013
Reference: [11] Chow, C. K., Liu, C. N.: Approximating discrete probability distributions with dependence trees..IEEE Trans. Inform. Theory 14 (1968), 462-467. Zbl 0165.22305, 10.1109/TIT.1968.1054142
Reference: [12] Cover, T. M.: Elements of Information Theory..John Wiley and Sons, 1991. Zbl 1140.94001, MR 1122806
Reference: [13] Csiszár, I., Körner, J.: Information Theory..Academic Press, 1981. MR 0666545
Reference: [14] Dagum, P., Luby, M.: Approximating probabilistic inference in belief networks is NP-hard..Artificial Intelligence 60 (1993), 141-153. MR 1216898, 10.1016/0004-3702(93)90036-B
Reference: [15] Dasgupta, S.: Learning polytrees..In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141.
Reference: [16] Deshpande, A., Garofalakis, M., Jordan, M. I.: Efficient stepwise selection in decomposable models..In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135.
Reference: [17] Ding, G., Lax, R. F., Chen, J., Chen, P. P., Marx, B. D.: Comparison of greedy strategies for learning Markov networks of treewidth $k$..In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301.
Reference: [18] Havránek, T.: On model search methods..In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108.
Reference: [19] Havránek, T.: Simple formal systems in data analysis..In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381.
Reference: [20] Jensen, F. V., Jensen, F.: Optimal junction trees..In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366.
Reference: [21] Karger, D., Srebro, N.: Learning Markov networks: maximum bounded tree-width graphs..In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. Zbl 0987.68067, MR 1958431
Reference: [22] Kloks, T.: Tree-width..LNCS 842, Springer Verlag, Berlin 1994. Zbl 0925.05052
Reference: [23] Kocka, T.: New algorithm for learning decomposable models..Unpublished manuscript, 2000.
Reference: [24] Kovács, E., Szántai, T.: Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network..arXiv:1105.1697v1, 2011.
Reference: [25] Ku, H. H., Kullback, S.: Approximating discrete probability distributions..IEEE Trans. Inform. Theory 15 (1969), 444-447. Zbl 0174.23202, MR 0243669, 10.1109/TIT.1969.1054336
Reference: [26] Lauritzen, S. L.: Graphical Models..Clarendon Press, Oxford 1996. MR 1419991
Reference: [27] II, P. M. Lewis: Approximating probability distributions to reduce storage requirements..Inform. and Control 2 (1959), 214-225. MR 0110597, 10.1016/S0019-9958(59)90207-4
Reference: [28] Malvestuto, F. M.: Operations research in the design of statistical databases (in Italian)..In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130.
Reference: [29] Malvestuto, F. M.: Approximating discrete probability distributions with decomposable models..IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. 10.1109/21.120082
Reference: [30] Malvestuto, F. M.: An axiomatization of loglinear models with an application to the model search..In: Learning from Data, LNS 112 (1996), pp. 175-184.
Reference: [31] Malvestuto, F. M.: Designing a probabilistic database from a given set of full conditional independences..In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999.
Reference: [32] Malvestuto, F. M.: A hypergraph-theoretic analysis of collapsibility and decomposability for extended log-linear models..Statist. Comput. 11 (2001), 155-169. MR 1837135, 10.1023/A:1008979300007
Reference: [33] Malvestuto, F. M.: From conditional independences to factorization constraints with discrete random variables..Ann. Math. and Artificial Intelligence 35 (2002), 253-285. Zbl 1001.68033, MR 1899954, 10.1023/A:1014551721406
Reference: [34] Malvestuto, F. M.: Tree and local computations in a cross-entropy minimization problem with marginal constraints..Kybernetika 46 (2010), 621-654. Zbl 1204.93113, MR 2722092
Reference: [35] Meek, C.: Finding a path is harder than finding a tree..J. Artificial Intelligence Res. 15 (2001), 383-389. Zbl 0994.68120, MR 1884083
Reference: [36] Mezzini, M., Moscarini, M.: Simple algorithms for minimal triangulation of a graph and backward selection of a decomposable Markov network..Theoret. Comput. Sci. 411 (2010), 958-966. Zbl 1213.05251, MR 2606034, 10.1016/j.tcs.2009.10.004
Reference: [37] Nunez, K., Chen, J., Chen, P., Ding, G., Lax, R. F., Marx, B.: Empirical comparison of greedy strategies for learning Markov networks of treewidth $k$..In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113.
Reference: [38] Rose, J. D.: On simple characterizations of $k$-trees..Discrete Math. 7 (1974), 317-322. Zbl 0285.05128, MR 0335319, 10.1016/0012-365X(74)90042-9
Reference: [39] Szántai, T., Kovács, E.: Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution..In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. MR 2874757
Reference: [40] Szántai, T., Kovács, E.: Discovering a junction tree behind a Markov network by a greedy algorithm..arXiv:1104.2762v3, 2011.
Reference: [41] Tarjan, R. E., Yannakakis, M.: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce hypergraphs..SIAM J. Comput. 13 (1984), 566-579. MR 0749707, 10.1137/0213035
Reference: [42] Wermuth, N.: Analogies between multiplicative models in contingency tables and covariance selection..Biometrics 32 (1976), 95-108. Zbl 0357.62049, MR 0403088, 10.2307/2529341
Reference: [43] Wermuth, N.: Model search among multiplicative models..Biometrics 32 (1976), 253-256. Zbl 0339.62079, 10.2307/2529496
Reference: [44] Xiang, Y., Wong, S. K. M., Cercone, N.: A ``microscopic'' study of minimum entropy search in learning decomposable Markov networks..Mach. Learning 26 (1997), 65-72. Zbl 0866.68088, 10.1023/A:1007324100110
.

Files

Files Size Format View
Kybernetika_48-2012-5_1.pdf 675.4Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo