Previous |  Up |  Next

Article

Title: On the Jensen-Shannon divergence and the variation distance for categorical probability distributions (English)
Author: Corander, Jukka
Author: Remes, Ulpu
Author: Koski, Timo
Language: English
Journal: Kybernetika
ISSN: 0023-5954 (print)
ISSN: 1805-949X (online)
Volume: 57
Issue: 6
Year: 2021
Pages: 879-907
Summary lang: English
.
Category: math
.
Summary: We establish a decomposition of the Jensen-Shannon divergence into a linear combination of a scaled Jeffreys' divergence and a reversed Jensen-Shannon divergence. Upper and lower bounds for the Jensen-Shannon divergence are then found in terms of the squared (total) variation distance. The derivations rely upon the Pinsker inequality and the reverse Pinsker inequality. We use these bounds to prove the asymptotic equivalence of the maximum likelihood estimate and minimum Jensen-Shannon divergence estimate as well as the asymptotic consistency of the minimum Jensen-Shannon divergence estimate. These are key properties for likelihood-free simulator-based inference. (English)
Keyword: blended divergences
Keyword: Chan-Darwiche metric
Keyword: likelihood-free inference
Keyword: implicit maximum likelihood
Keyword: reverse Pinsker inequality
Keyword: simulator-based inference
MSC: 62B10
MSC: 62H05
MSC: 94A17
idZBL: Zbl 07478645
idMR: MR4376866
DOI: 10.14736/kyb-2021-6-0879
.
Date available: 2022-02-04T08:37:33Z
Last updated: 2022-02-24
Stable URL: http://hdl.handle.net/10338.dmlcz/149345
.
Reference: [1] Barnet, N. S., Dragomir, S.: A survey of recent inequalities for $\phi$-divergences of discrete probability distributions..In: Advances in Inequalities from Probability Theory and Statistics (N. S. Barnett and S. S. Dragomir, eds.), Nova Science Publishing, New York 2008, pp. 1-85. MR 2459969,
Reference: [2] Basseville, M.: Divergence measures for statistical data processing -- An annotated bibliography..Signal Processing 93 (2013), 621-633.
Reference: [3] Berend, D., Kontorovich, A.: A sharp estimate of the binomial mean absolute deviation with applications..Stat. Probab. Lett. 83 (2013), 1254-259. MR 3041401, 10.1016/j.spl.2013.01.023
Reference: [4] Tutorial, BOLFI, Manual: https://elfi.readthedocs.io/en/latest/usage/BOLFI.html, 2017..
Reference: [5] Böhm, U., Dahm, P. F., McAllister, B. F., Greenbaum, I. F.: Identifying chromosomal fragile sites from individuals: a multinomial statistical model..Human Genetics 95 (1995), 249-256. 10.1007/BF00225189
Reference: [6] Chan, H., Darwiche, A.: A distance measure for bounding probabilistic belief change..Int. J. Approx. Reasoning 38 (2005), 149-174. MR 2116782,
Reference: [7] Chan, H., Darwiche, A.: On the revision of probabilistic beliefs using uncertain evidence..Artif. Intell. 163 (2005), 67-90. MR 2120039, 10.1016/j.artint.2004.09.005
Reference: [8] Charalambous, C. D., Tzortzis, I., Loyka, S., Charalambous, T.: Extremum problems with total variation distance and their applications..IEEE Trans. Automat. Control 59 (2014), 2353-2368. MR 3254531,
Reference: [9] Corander, J., Fraser, C., Gutmann, M. U., Arnold, B., Hanage, W. P., Bentley, S. D., Lipsitch, M., Croucher, N. J.: Frequency-dependent selection in vaccine-associated pneumococcal population dynamics..Nature Ecology Evolution 1 (2017), 1950-1960.
Reference: [10] Cover, Th. M., Thomas, J. A.: Elements of Information Theory. Second edition..John Wiley and Sons, New York 2012. MR 2239987
Reference: [11] Cranmer, K., Brehmer, J., Louppe, G.: The frontier of simulation-based inference..Proc. Natl. Acad. Sci. USA 117 (2020), 30055-30062. MR 4263287,
Reference: [12] Csiszár, I., Talata, Z.: Context tree estimation for not necessarily finite memory processes, via BIC and MDL..IEEE Trans. Inform. Theory 52 (2006), 1007-1016. MR 2238067,
Reference: [13] Csiszár, I., Shields, P. C.: Information Theory and Statistics: A tutorial..Now Publishers Inc, Delft 2004.
Reference: [14] Devroye, L.: The equivalence of weak, strong and complete convergence in $ L_1 $ for kernel density estimates..Ann. Statist. 11 (1983), 896-904. MR 0707939, 10.1214/aos/1176346255
Reference: [15] Diggle, P. J., Gratton, R. J.: Monte Carlo methods of inference for implicit statistical models..J. R. Stat. Soc. Ser. B. Stat. Methodol. 46, (1984), 193-212. MR 0781880
Reference: [16] M.Endre, D., Schindelin, J. E.: A new metric for probability distributions..IEEE Trans. Inform. Theory 49 (2003), 1858-1860. MR 1985590,
Reference: [17] Fedotov, A. A., Harremoës, P., Topsøe, F.: Refinements of Pinsker's inequality..IEEE Trans. Inform. Theory 49 (2003), 1491-1498. MR 1984937,
Reference: [18] Gibbs, A. L., Su, F. E.: On choosing and bounding probability metrics..Int. Stat. Rev. 70 (2002), 419-435.
Reference: [19] Guntuboyina, A.: Lower bounds for the minimax risk using $ f $-divergences, and applications..IEEE Trans. Inform. Theory 57 (2011), 2386-2399. MR 2809097,
Reference: [20] Gutmann, M. U., Corander, J.: Bayesian optimization for likelihood-free inference of simulator-based statistical models..J. Mach. Learn. Res. 17, (2016), 4256-4302. MR 3555016
Reference: [21] Gyllenberg, M., Koski, T., Reilink, E., Verlaan, M.: Non-uniqueness in probabilistic numerical identification of bacteria..J. App. Prob. 31 (1994), 542-548. MR 1274807,
Reference: [22] Gyllenberg, M., Koski, T.: Numerical taxonomy and the principle of maximum entropy..J. Classification 13 (1996), 213-229. MR 1421666,
Reference: [23] Holopainen, I.: Evaluating Uncertainty with Jensen-Shannon Divergence..Master's Thesis, Faculty of Science, University of Helsinki 2021.
Reference: [24] Hou, C-D., Chiang, J., Tai, J. J.: Identifying chromosomal fragile sites from a hierarchical-clustering point of view..Biometrics 57 (2001), 435-440. MR 1855677,
Reference: [25] Janžura, M., Boček, P.: A method for knowledge integration..Kybernetika 34 (1998), 41-55. MR 1619054
Reference: [26] Jardine, N., Sibson, R.: Mathematical Taxonomy..J. Wiley and Sons, London 1971. MR 0441395
Reference: [27] Khosravifard, M., Fooladivanda, D., Gulliver, T. A.: Exceptionality of the variational distance..In: 2006 IEEE Information Theory Workshop-ITW'06 Chengdu 2006, pp. 274-276.
Reference: [28] Koski, T.: Probability Calculus for Data Science..Studentlitteratur, Lund 2020.
Reference: [29] Kůs, V.: Blended $\phi $-divergences with examples..Kybernetika 39 (2003), 43-54. MR 1980123
Reference: [30] Kůs, V., Morales, D., Vajda, I.: Extensions of the parametric families of divergences used in statistical inference..Kybernetika 44 (2008), 95-112. MR 2405058,
Reference: [31] LeCam, L.: On the assumptions used to prove asymptotic normality of maximum likelihood estimates..Ann. Math. Statist. 41 (1970), 802-828. MR 0267676,
Reference: [32] Liese, F., Vajda, I.: On divergences and informations in statistics and information theory..IEEE Trans. Inform. Theory 52 (2006), 4394-4412. MR 2300826,
Reference: [33] Li, K., Mitendra, J.: Implicit maximum likelihood estimation..arXiv preprint arXiv:1809.09087, 2018).
Reference: [34] Lin, J.: Divergence measures based on the Shannon entropy..IEEE Trans. Inform. Theory 37 (1991), 145-151. MR 1087893,
Reference: [35] Lintusaari, J., Gutmann, M. U, Dutta, R., Kaski, S., Corander, J.: Fundamentals and recent developments in approximate Bayesian computation..Systematic Biology 66 (2017), e66-e82.
Reference: [36] Lintusaari, J., Vuollekoski, H., Kangasrääsiö, A., Skytén, K., Järvenpää, M., Marttinen, P., Gutmann, M. U., Vehtari, A., Corander, J., Kaski, S.: ELFI: Engine for likelihood-free inference..J. Mach. Learn. Res. 19 (2018), 1-7. MR 3862423
Reference: [37] Morales, D., Pardo, L., Vajda, I.: Asymptotic divergence of estimates of discrete distributions..J. Statist. Plann. Inference 48 (1995), 347-369. MR 1368984,
Reference: [38] Nowozin, S., Cseke, B., Tomioka, R.: f-gan: Training generative neural samplers using variational divergence minimization..Advances Neural Inform. Process. Systems (2016), 271-279.
Reference: [39] Okamoto, M.: Some inequalities relating to the partial sum of binomial probabilities..Ann. Inst.of Statist. Math. 10 (1959), 29-35. MR 0099733,
Reference: [40] Sason, I.: On f-divergences: Integral representations, local behavior, and inequalities..Entropy 20 (2018), 383-405. MR 3862573,
Reference: [41] Sason, I., Verdu, S.: $f$-divergence inequalities..IEEE Trans. Inform. Theory 62 (2016), 5973-6006. MR 3565096,
Reference: [42] Shannon, M.: Properties of f-divergences and f-GAN training..arXiv preprint arXiv:2009.00757, 2020.
Reference: [43] Sibson, R.: Information radius..Z. Wahrsch. Verw. Geb. 14 (1969), 149-160. MR 0258198,
Reference: [44] Sinn, M., Rawat, A.: Non-parametric estimation of Jensen-Shannon divergence in generative adversarial network training..In: International Conference on Artificial Intelligence and Statistics 2018, pp. 642-651.
Reference: [45] Taneja, I. J.: On mean divergence measures..In: Advances in Inequalities from Probability Theory and Statistics (N. S. Barnett and S. S. Dragomir, eds.), Nova Science Publishing, New York 2008, pp. 169-186. MR 2459974
Reference: [46] Topsøe, F.: Information-theoretical optimization techniques..Kybernetika 15 (1979), 8-27. MR 0529888
Reference: [47] Topsøe, F.: Some inequalities for information divergence and related measures of discrimination..IEEE Trans. Inform. Theory 46 (2000), 1602-1609. MR 1768575,
Reference: [48] Vajda, I.: Note on discrimination information and variation (Corresp.)..IEEE Trans. Inform. Theory 16 (1970), 771-773. MR 0275575,
Reference: [49] Vajda, I.: Theory of Statistical Inference and Information..Kluwer Academic Publ., Delft 1989.
Reference: [50] Vajda, I.: On metric divergences of probability measures..Kybernetika 45 (2009), 885-900. MR 2650071,
Reference: [51] Jr., J. I. Yellott: The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution..J. Math. Psych. 15 (1977), 109-144. MR 0449795,
Reference: [52] Österreicher, F., Vajda, I.: Statistical information and discrimination..IEEE Trans. Inform. Theory 39 (1993), 1036-1039. MR 1237725,
.

Files

Files Size Format View
Kybernetika_57-2021-6_1.pdf 743.5Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo