# Article

Full entry | PDF   (1.0 MB)
Keywords:
Kullback–Leibler divergence; relative entropy; exponential family; information projection; log-Laplace transform; cumulant generating function; directional derivatives; first order optimality conditions; convex functions; polytopes
Summary:
The information divergence of a probability measure $P$ from an exponential family $\mathcal{E}$ over a finite set is defined as infimum of the divergences of $P$ from $Q$ subject to $Q\in \mathcal{E}$. All directional derivatives of the divergence from $\mathcal{E}$ are explicitly found. To this end, behaviour of the conjugate of a log-Laplace transform on the boundary of its domain is analysed. The first order conditions for $P$ to be a maximizer of the divergence from $\mathcal{E}$ are presented, including new ones when $P$ is not projectable to $\mathcal{E}$.
References:
[1] Ay N.: An information-geometric approach to a theory of pragmatic structuring. Ann. Probab. 30 (2002), 416–436 MR 1894113 | Zbl 1010.62007
[2] Ay N.: Locality of Global Stochastic Interaction in Directed Acyclic Networks. Neural Computation 14 (2002), 2959–2980 Zbl 1079.68582
[3] Ay N., Knauf A.: Maximizing multi-information. Kybernetika 45 (2006), 517–538 MR 2283503
[4] Ay N., Wennekers T.: Dynamical properties of strongly interacting Markov chains. Neural Networks 16 (2003), 1483–1497
[5] Barndorff-Nielsen O.: Information and Exponential Families in Statistical Theory. Wiley, New York 1978 MR 0489333 | Zbl 0387.62011
[6] Brown L. D.: Fundamentals of Statistical Exponential Families. (Lecture Notes – Monograph Series 9.) Institute of Mathematical Statistics, Hayward, CA 1986 MR 0882001 | Zbl 0685.62002
[8] Csiszár I., Matúš F.: Information projections revisited. IEEE Trans. Inform. Theory 49 (2003), 1474–1490 MR 1984936 | Zbl 1063.94016
[9] Csiszár I., Matúš F.: Closures of exponential families. Ann. Probab. 33 (2005), 582–600 MR 2123202 | Zbl 1068.60008
[10] Csiszár I., Matúš F.: Generalized maximum likelihood estimates for exponential families. To appear in Probab. Theory Related Fields (2008) MR 2372970 | Zbl 1133.62039
[11] Pietra S. Della, Pietra, V. Della, Lafferty J.: Inducing features of random fields. IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997), 380–393
[12] Letac G.: Lectures on Natural Exponential Families and their Variance Functions. (Monografias de Matemática 50.) Instituto de Matemática Pura e Aplicada, Rio de Janeiro 1992 MR 1182991 | Zbl 0983.62501
[13] Matúš F.: Maximization of information divergences from binary i. i.d. sequences. In: Proc. IPMU 2004, Perugia 2004, Vol. 2, pp. 1303–1306
[14] Matúš F., Ay N.: On maximization of the information divergence from an exponential family. In: Proc. WUPES’03 (J. Vejnarová, ed.), University of Economics, Prague 2003, pp. 199–204
[15] Rockafellar R. T.: Convex Analysis. Princeton University Press, Priceton, N.J. 1970 MR 0274683
[16] Wennekers T., Ay N.: Finite state automata resulting from temporal information maximization. Theory in Biosciences 122 (2003), 5–18 Zbl 1090.68064

Partner of