Title:
|
A backward selection procedure for approximating a discrete probability distribution by decomposable models (English) |
Author:
|
Malvestuto, Francesco M. |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 |
Volume:
|
48 |
Issue:
|
5 |
Year:
|
2012 |
Pages:
|
825-844 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal{H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal{H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal{H}}$ to $p$ by the information divergence $D(p: p_{\mathcal{H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal{H}}$ of tree-width $k$ such that $D(p: p_{\mathcal{H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a `good' solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$. (English) |
Keyword:
|
backward selection |
Keyword:
|
information divergence |
Keyword:
|
decomposable model |
Keyword:
|
acyclic hypergraph |
Keyword:
|
$k$-hypertree |
MSC:
|
05C65 |
MSC:
|
62-09 |
MSC:
|
68R10 |
MSC:
|
68T05 |
idMR:
|
MR3086854 |
. |
Date available:
|
2012-12-17T13:25:32Z |
Last updated:
|
2013-09-24 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/143083 |
. |
Reference:
|
[1] Almond, R., Kong, A.: Optimality Issues in Constructing a Markov Tree from Graphical Models..Research Report A-3, Dept. Statistics, Harvard University, 1991. |
Reference:
|
[2] Altmüller, A., Haralick, R. M.: Approximating high dimensional probability disributions..In: Proc. XVII Int. Conf. on Patter Recognitions, 2004. |
Reference:
|
[3] Altmüller, A., Haralick, R. M.: Practical aspects of efficient forward selection in decomposable graphical models..In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715. |
Reference:
|
[4] Bach, F. R., Jordan, M. I.: Thin junction trees..Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572. |
Reference:
|
[5] Badsberg, J.-H., Malvestuto, F. M.: An implementation of the iterative proportional fitting procedure by propagation trees..Comput. Statist. Data Anal. 37 (2001), 297-322. Zbl 1061.65500, MR 1856676, 10.1016/S0167-9473(01)00013-5 |
Reference:
|
[6] Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes..J. ACM 30 (1983), 479-513. Zbl 0624.68087, MR 0709830, 10.1145/2402.322389 |
Reference:
|
[7] Beineke, L. W., Pippert, R. E.: The enumeration of labelled 2-trees..J. Combin. Theory 6 (1969), 200-205. MR 0234868 |
Reference:
|
[8] Bishop, Y., Fienberg, S. E., Holland, P. W.: Discrete Multivariate Analysis: Theory and Practice..MIT Press, Cambridge 1975. Zbl 1131.62043, MR 0381130 |
Reference:
|
[9] Brown, D. T.: A note on approximations to discrete probability distributions..Inform. and Control 2 (1959), 386-392. Zbl 0117.14804, MR 0110598, 10.1016/S0019-9958(59)80016-4 |
Reference:
|
[10] Chickering, D.: Learning Bayesian networks is NP-complete..In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. MR 1473013 |
Reference:
|
[11] Chow, C. K., Liu, C. N.: Approximating discrete probability distributions with dependence trees..IEEE Trans. Inform. Theory 14 (1968), 462-467. Zbl 0165.22305, 10.1109/TIT.1968.1054142 |
Reference:
|
[12] Cover, T. M.: Elements of Information Theory..John Wiley and Sons, 1991. Zbl 1140.94001, MR 1122806 |
Reference:
|
[13] Csiszár, I., Körner, J.: Information Theory..Academic Press, 1981. MR 0666545 |
Reference:
|
[14] Dagum, P., Luby, M.: Approximating probabilistic inference in belief networks is NP-hard..Artificial Intelligence 60 (1993), 141-153. MR 1216898, 10.1016/0004-3702(93)90036-B |
Reference:
|
[15] Dasgupta, S.: Learning polytrees..In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141. |
Reference:
|
[16] Deshpande, A., Garofalakis, M., Jordan, M. I.: Efficient stepwise selection in decomposable models..In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135. |
Reference:
|
[17] Ding, G., Lax, R. F., Chen, J., Chen, P. P., Marx, B. D.: Comparison of greedy strategies for learning Markov networks of treewidth $k$..In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301. |
Reference:
|
[18] Havránek, T.: On model search methods..In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108. |
Reference:
|
[19] Havránek, T.: Simple formal systems in data analysis..In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381. |
Reference:
|
[20] Jensen, F. V., Jensen, F.: Optimal junction trees..In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366. |
Reference:
|
[21] Karger, D., Srebro, N.: Learning Markov networks: maximum bounded tree-width graphs..In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. Zbl 0987.68067, MR 1958431 |
Reference:
|
[22] Kloks, T.: Tree-width..LNCS 842, Springer Verlag, Berlin 1994. Zbl 0925.05052 |
Reference:
|
[23] Kocka, T.: New algorithm for learning decomposable models..Unpublished manuscript, 2000. |
Reference:
|
[24] Kovács, E., Szántai, T.: Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network..arXiv:1105.1697v1, 2011. |
Reference:
|
[25] Ku, H. H., Kullback, S.: Approximating discrete probability distributions..IEEE Trans. Inform. Theory 15 (1969), 444-447. Zbl 0174.23202, MR 0243669, 10.1109/TIT.1969.1054336 |
Reference:
|
[26] Lauritzen, S. L.: Graphical Models..Clarendon Press, Oxford 1996. MR 1419991 |
Reference:
|
[27] II, P. M. Lewis: Approximating probability distributions to reduce storage requirements..Inform. and Control 2 (1959), 214-225. MR 0110597, 10.1016/S0019-9958(59)90207-4 |
Reference:
|
[28] Malvestuto, F. M.: Operations research in the design of statistical databases (in Italian)..In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130. |
Reference:
|
[29] Malvestuto, F. M.: Approximating discrete probability distributions with decomposable models..IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. 10.1109/21.120082 |
Reference:
|
[30] Malvestuto, F. M.: An axiomatization of loglinear models with an application to the model search..In: Learning from Data, LNS 112 (1996), pp. 175-184. |
Reference:
|
[31] Malvestuto, F. M.: Designing a probabilistic database from a given set of full conditional independences..In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999. |
Reference:
|
[32] Malvestuto, F. M.: A hypergraph-theoretic analysis of collapsibility and decomposability for extended log-linear models..Statist. Comput. 11 (2001), 155-169. MR 1837135, 10.1023/A:1008979300007 |
Reference:
|
[33] Malvestuto, F. M.: From conditional independences to factorization constraints with discrete random variables..Ann. Math. and Artificial Intelligence 35 (2002), 253-285. Zbl 1001.68033, MR 1899954, 10.1023/A:1014551721406 |
Reference:
|
[34] Malvestuto, F. M.: Tree and local computations in a cross-entropy minimization problem with marginal constraints..Kybernetika 46 (2010), 621-654. Zbl 1204.93113, MR 2722092 |
Reference:
|
[35] Meek, C.: Finding a path is harder than finding a tree..J. Artificial Intelligence Res. 15 (2001), 383-389. Zbl 0994.68120, MR 1884083 |
Reference:
|
[36] Mezzini, M., Moscarini, M.: Simple algorithms for minimal triangulation of a graph and backward selection of a decomposable Markov network..Theoret. Comput. Sci. 411 (2010), 958-966. Zbl 1213.05251, MR 2606034, 10.1016/j.tcs.2009.10.004 |
Reference:
|
[37] Nunez, K., Chen, J., Chen, P., Ding, G., Lax, R. F., Marx, B.: Empirical comparison of greedy strategies for learning Markov networks of treewidth $k$..In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113. |
Reference:
|
[38] Rose, J. D.: On simple characterizations of $k$-trees..Discrete Math. 7 (1974), 317-322. Zbl 0285.05128, MR 0335319, 10.1016/0012-365X(74)90042-9 |
Reference:
|
[39] Szántai, T., Kovács, E.: Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution..In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. MR 2874757 |
Reference:
|
[40] Szántai, T., Kovács, E.: Discovering a junction tree behind a Markov network by a greedy algorithm..arXiv:1104.2762v3, 2011. |
Reference:
|
[41] Tarjan, R. E., Yannakakis, M.: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce hypergraphs..SIAM J. Comput. 13 (1984), 566-579. MR 0749707, 10.1137/0213035 |
Reference:
|
[42] Wermuth, N.: Analogies between multiplicative models in contingency tables and covariance selection..Biometrics 32 (1976), 95-108. Zbl 0357.62049, MR 0403088, 10.2307/2529341 |
Reference:
|
[43] Wermuth, N.: Model search among multiplicative models..Biometrics 32 (1976), 253-256. Zbl 0339.62079, 10.2307/2529496 |
Reference:
|
[44] Xiang, Y., Wong, S. K. M., Cercone, N.: A ``microscopic'' study of minimum entropy search in learning decomposable Markov networks..Mach. Learning 26 (1997), 65-72. Zbl 0866.68088, 10.1023/A:1007324100110 |
. |