Previous |  Up |  Next

Article

Title: Tree and local computations in a cross–entropy minimization problem with marginal constraints (English)
Author: Malvestuto, Francesco M.
Language: English
Journal: Kybernetika
ISSN: 0023-5954
Volume: 46
Issue: 4
Year: 2010
Pages: 621-654
Summary lang: English
.
Category: math
.
Summary: In probability theory, Bayesian statistics, artificial intelligence and database theory the minimum cross-entropy principle is often used to estimate a distribution with a given set $P$ of marginal distributions under the proportionality assumption with respect to a given ``prior'' distribution $q$. Such an estimation problem admits a solution if and only if there exists an extension of $P$ that is dominated by $q$. In this paper we consider the case that $q$ is not given explicitly, but is specified as the maximum-entropy extension of an auxiliary set $Q$ of distributions. There are three problems that naturally arise: (1) the existence of an extension of a distribution set (such as $P$ and $Q$), (2) the existence of an extension of $P$ that is dominated by the maximum entropy extension of $Q$, (3) the numeric computation of the minimum cross-entropy extension of $P$ with respect to the maximum entropy extension of $Q$. In the spirit of a divide-and-conquer approach, we prove that, for each of the three above-mentioned problems, the global solution can be easily obtained by combining the solutions to subproblems defined at node level of a suitable tree. (English)
Keyword: cross-entropy
Keyword: acyclic hypergraph
Keyword: connection tree
Keyword: junction tree
Keyword: probabilistic database
Keyword: relational database
MSC: 62A10
MSC: 93E12
idZBL: Zbl 1204.93113
idMR: MR2722092
.
Date available: 2010-10-22T05:23:52Z
Last updated: 2013-09-21
Stable URL: http://hdl.handle.net/10338.dmlcz/140775
.
Reference: [1] Asmussen, S., Edwards, D.: Collapsibility and response variables in contingency tables.Biometrika 70 (1983), 367–378. Zbl 0549.62041, MR 0725370, 10.1093/biomet/70.3.567
Reference: [2] Bacharach, M.: Biproportional Matrices and Input-Output Change.Cambridge University Press, Cambridge 1970. Zbl 0195.49705, MR 0263409
Reference: [3] Badsberg, J.-H., Malvestuto, F. M.: An implementation of the iterative proportional fitting procedure by propagation trees.Comput. Statist. Data Analysis 37 (2001), 297–322. Zbl 1061.65500, MR 1856676, 10.1016/S0167-9473(01)00013-5
Reference: [4] Beeri, C., Vardi, M.: On the Properties of Full Join Dependencies.Adv. Database Theory I, Plenum Press, New York 1981.
Reference: [5] Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes. J. Assoc. Comput. Mach. 30 (1983), 479–513. Zbl 0624.68087, MR 0709830, 10.1145/2402.322389
Reference: [6] Berge, C.: Hypergraphs.North-Holland, Amsterdam 1989. Zbl 0674.05001, MR 1013569
Reference: [7] Berge, C.: Discrete Multivariate Analysis.MIT Press, Cambridge 1975.
Reference: [8] Csiszár, I.: I-divergence geometry of probability distributions and minimization problems.Ann. Probab. 3 (1975), 146–158. MR 0365798, 10.1214/aop/1176996454
Reference: [9] Csiszár, I.: Maxent, mathematics, and information theory.In: Proc. Internat. Workshop on “Maximum entropy and Bayesian methods", 1995, pp. 35–50. MR 1446714
Reference: [10] Dall’Aglio, G., Kotz, K., Salinetti, G.: Advances in Probability Distributions with Given Marginals.Kluwer Academic Pub., Dordrecht, Boston, London 1991. MR 1215942
Reference: [11] Darroch, J. N., Ratcliff, D.: Generalized iterative scaling for log-linear models.Ann. Math. Statist. 43 (1972), 1470–1480. Zbl 0251.62020, MR 0345337, 10.1214/aoms/1177692379
Reference: [12] Deming, W. E.: Statistical Adjustment of Data.Dover Pub., New York 1943. Zbl 0060.31504, MR 0009819
Reference: [13] Deming, W. E., Stephan, F. F.: On a least squares adjustment of a sampled frequency table when the expected marginal totals are known.Ann. Math. Statist. 11 (1940), 427–444. Zbl 0024.05502, MR 0003527, 10.1214/aoms/1177731829
Reference: [14] Endo, Y., Takemura, A. I.: Iterative proportional scaling via decomposable submodels for contingency tables.Comput. Statist. Data Analysis 53 (2009), 966–978. MR 2657062, 10.1016/j.csda.2008.11.013
Reference: [15] Fienberg, S. E.: An iterative procedure for estimation in contingency tables.Ann. Math. Statist. 41 (1970), 907–917. Zbl 0198.23401, MR 0266394, 10.1214/aoms/1177696968
Reference: [16] Fienberg, S. E., Meyer, M. M.: Iterative proportional fitting.In: Encyclopedia of Statistical Sciences (S. Kotz, N. L. Johnson, and C. B. Read, eds.), 4, John Wiley and Sons, New York, pp.  275–279.
Reference: [17] Haberman, S. J.: Log-linear Models for Contingency Tables.University of Chicago Press, Chicago 1974.
Reference: [18] Ireland, C. T., Kullback, S.: Contingency tables with given marginals.Biometrika 55 (1968), 179–188. Zbl 0155.26701, MR 0229329, 10.1093/biomet/55.1.179
Reference: [19] Jiroušek, R.: Composition of probability measures on finite spaces.In: Proc. XIII Internat. Conf. Uncertainty in Artificial Intelligence 1997, pp. 274–281.
Reference: [20] Jiroušek, R., Přeučil, S.: On the effective implementation of the iterative proportional fitting procedure.Comput. Statist. Data Analysis 19 (1995), 177–189. 10.1016/0167-9473(93)E0055-9
Reference: [21] Johnson, R. W.: Axiomatic characterization of the directed divergences and their linear combinations.IEEE Trans. Inform. Theory 25 (1979), 709–716. Zbl 0422.60016, MR 0551270, 10.1109/TIT.1979.1056113
Reference: [22] Kellerer, H. G.: Verteilungsfunktionen mit gegebenen marginalverteilungen.Zeitschrift Wahrscheinlichkeitstheorie und Verw. Gebiete 3 (1964), 247–270. Zbl 0126.34003, MR 0175158, 10.1007/BF00534912
Reference: [23] Kellerer, H. G.: Masstheoretische marginalprobleme.Math. Annalen 153 (1964), 168–198. Zbl 0118.05003, MR 0161956, 10.1007/BF01360315
Reference: [24] Kern-Isberner, G.: Characterizing the principle of minimum-cross entropy within a conditional-logical framework.Artificial Intelligence 98 (1998), 169–208. Zbl 0903.68181, MR 1614388, 10.1016/S0004-3702(97)00068-4
Reference: [25] Ku, H. H., Kullback, S.: Interaction in multidimensional contingency tables: an information-theoretic approach.J. Res. Nat. Bur. Standards - Math. Sci. 72 B (1968), 159–199. Zbl 0274.62036, MR 0258223
Reference: [26] Lauritzen, S. L.: Graphical Models.Oxford Science Pub., Clarendom Press, Oxford 1996. MR 1419991
Reference: [27] Lauritzen, S. L., Speed, M. P., Vijayan, K.: Decomposable graphs and hypergraphs.J. Austral. Math. Soc. Ser. A 36 (1984), 12–29. Zbl 0533.05046, MR 0719998, 10.1017/S1446788700027300
Reference: [28] Lauritzen, S. L., Spiegelhalter, D. J.: Local computations with probabilities on graphical structures and their application to expert systems.J. Roy. Stat. Soc. Ser. B 50 (1988), 157–224. Zbl 0684.68106, MR 0964177
Reference: [29] Leimer, G.: Optimal decomposition by clique separators.Discrete Math. 113 (1993), 99–123. Zbl 0793.05128, MR 1212872, 10.1016/0012-365X(93)90510-Z
Reference: [30] Leontief, W. W.: The Structure of American Economy 1919–1929.Oxford University Press, New York 1941.
Reference: [31] Leontief, W. W., Strout, A.: Multiregional input-output analysis.In: Structural Interdependence and Economic Development, 1963, pp. 119–169.
Reference: [32] Madigan, D., Mosurski, K.: An extension of the results of Asmussen and Edwards on collapsibility in contingency tables.Biometrika 77 (1990), 315–319. Zbl 0731.62113, MR 1064803, 10.1093/biomet/77.2.315
Reference: [33] Madigan, D., Mosurski, K.: Errata: An extension of the results of Asmussen and Edwards on collapsibility in contingency tables.Biometrika 86 (1999) 973. MR 1741994
Reference: [34] Maier, D.: The Theory of Relational Databases.Computer Science Press, 1983. (http://web.cecs.pdx.edu/ maier/TheoryBook/TRD.html) Zbl 0519.68082, MR 0691493
Reference: [35] Maier, D., Ullman, J. D.: Connections in acyclic hypergraphs.Theoret. Comput. Sci. 32 (1984), 185–199. Zbl 0557.05054, MR 0761167, 10.1016/0304-3975(84)90030-6
Reference: [36] Malvestuto, F. M.: Answering queries in categorical data bases.In: Proc. VI ACM Symp. Principles of Database Systems 1987, pp. 87–96.
Reference: [37] Malvestuto, F. M.: Existence of extensions and product extensions for discrete probability distributions.Discrete Math. 69 (1988), 61–77. Zbl 0637.60021, MR 0935028, 10.1016/0012-365X(88)90178-1
Reference: [38] Malvestuto, F. M.: Computing the maximum-entropy extension of given discrete probability distributions.Computat. Statist. Data Anal. 8 (1989), 299–311. Zbl 0726.62012, MR 1028405, 10.1016/0167-9473(89)90046-7
Reference: [39] Malvestuto, F. M.: Testing implication of hierarchical loglinear models for discrete probability distributions.Statist. Computing 6 (1996), 169–176. 10.1007/BF00162528
Reference: [40] Malvestuto, F. M.: A hypergraph-theoretic analysis of collapsibility and decomposability for extended loglinear models.Statist. Computing 11 (2001), 155–169. MR 1837135, 10.1023/A:1008979300007
Reference: [41] Malvestuto, F. M.: From conditional independences to factorization constraints with discrete random variables.Ann. Math. Artificial Intelligence 35 (2002), 253–285. Zbl 1001.68033, MR 1899954, 10.1023/A:1014551721406
Reference: [42] Malvestuto, F. M.: Canonical and monophonic convexities in hypergraphs. Discrete Math. 309 (2009), 4287–4298. Zbl 1211.05093, MR 2519164, 10.1016/j.disc.2009.01.003
Reference: [43] Malvestuto, F. M., Moscarini, M.: A fast algorithm for query optimization in universal-relation databases.J. Comput. System Sci. 56 (1998), 299–309. Zbl 0913.68060, MR 1633981, 10.1006/jcss.1998.1570
Reference: [44] Malvestuto, F. M., Moscarini, M.: Decomposition of a hypergraph by partial-edge separators.Theoret. Comput. Sci. 237 (2000), 57–79. Zbl 0939.68089, MR 1756201, 10.1016/S0304-3975(98)00128-5
Reference: [45] Malvestuto, F. M., Pourabbas, E.: Customized answers to summary queries via aggregate views.In: Proc. XVI Intl. Conf. Scientific & Statistical Database Management 2004, pp. 193–202.
Reference: [46] Malvestuto, F. M., Pourabbas, E.: Local computation of answers to table queries on summary databases.In: Proc. XVII Intl. Conf. Scientific & Statistical Database Management 2005, pp. 263–272.
Reference: [47] Matúš, F.: Discrete marginal problem for complex measures.Kybernetika 24 (1988), 36–46. MR 0936552
Reference: [48] Matúš, F.: On the maximum-entropy extensions of probability measures over undirected graphs.In: Proc. III Workshop Uncertainty Processing in Expert Systems 1994, pp. 181–198.
Reference: [49] Matúš, F., Flusser, J.: Image representations via a finite Radon transform.IEEE Trans. Pattern Analysis and Machine Intelligence 15 (1993), 996–1006. 10.1109/34.254058
Reference: [50] Purcell, N. J., Kish, L.: Estimation for small domains.Biometrics 35 (1979), 365–384. Zbl 0419.62092, MR 0535774, 10.2307/2530340
Reference: [51] Purcell, N. J., Kish, L.: Postcensal estimates for local areas (or domains).Internat. Statist. Rev. 48 (1980), 3–18. Zbl 0433.62080, 10.2307/1402400
Reference: [52] Rüschendorf, L.: Convergence of the iterative proportional fitting procedure.Ann. Statist. 23 (1995), 1160–1174. MR 1353500, 10.1214/aos/1176324703
Reference: [53] Shore, J. E., Johnson, R. W.: Properties of cross-entropy minimization.IEEE Trans. Inform. Theory 27 (1981), 472–482. Zbl 0459.94008, MR 0635526, 10.1109/TIT.1981.1056373
Reference: [54] Stephan, F. F.: An iterative method of adjusting sample frequencies tables when expected marginal totals are known.Ann. Math. Statist. 13 (1942), 166–178. MR 0006674, 10.1214/aoms/1177731604
Reference: [55] Stone, R., Brown, A.: A Computable Model for Economic Growth: A Programme for Growth, No. 1.Chapman Hall, London 1962.
Reference: [56] Tarjan, R. E., Yannakakis, M.: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce hypergraphs.SIAM J. Comput. 13 (1984), 566–579. MR 0749707, 10.1137/0213035
Reference: [57] Vomlel, J.: Integrating inconsistent data in a probabilistic model.J. Appl. Non-Classical Logics 14 (2004), 365–386. Zbl 1185.68699, 10.3166/jancl.14.367-386
Reference: [58] Vorob’ev, N. N.: Consistent families of measures and their extensions.Theor. Prob. Appl. 7 (1962), 147–163. 10.1137/1107014
Reference: [59] Vorob’ev, N. N.: Markov measures and Markov extensions.Theor. Prob. Appl. 8 (1963), 420–429. MR 0169295, 10.1137/1108047
Reference: [60] Yannakakis, M.: Computing the minimum fill-in is NP-complete.SIAM J. Algebraic Discrete Mathematics 2 (1981), 77–79. Zbl 0496.68033, MR 0604513, 10.1137/0602010
Reference: [61] Yannakakis, M.: Algorithms for acyclic database schemes.In: Proc. VII Internat. Conf. Very Large Data Bases 1981, pp. 82–94.
.

Files

Files Size Format View
Kybernetika_46-2010-4_4.pdf 449.5Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo