Kullback–Leibler divergence; relative entropy; exponential family; information projection; log-Laplace transform; cumulant generating function; directional derivatives; first order optimality conditions; convex functions; polytopes
The information divergence of a probability measure $P$ from an exponential family $\mathcal{E}$ over a finite set is defined as infimum of the divergences of $P$ from $Q$ subject to $Q\in \mathcal{E}$. All directional derivatives of the divergence from $\mathcal{E}$ are explicitly found. To this end, behaviour of the conjugate of a log-Laplace transform on the boundary of its domain is analysed. The first order conditions for $P$ to be a maximizer of the divergence from $\mathcal{E}$ are presented, including new ones when $P$ is not projectable to $\mathcal{E}$.
