The Living Thing / Notebooks :

Probability metrics, estimation thereof from data

Independence tests, quantifications of statistical nearness.


Baba, K., Shibata, R., & Sibuya, M. (2004) Partial Correlation and Conditional Correlation as Measures of Conditional Independence. Australian & New Zealand Journal of Statistics, 46(4), 657–664. DOI.
Beirlant, J., Dudewicz, E. J., Györfi, L., & van der Meulen, E. C.(1997) Nonparametric entropy estimation: An overview. Journal of Mathematical and Statistical Sciences, 6(1), 17–39.
Beran, R. (1977) Minimum Hellinger Distance Estimates for Parametric Models. The Annals of Statistics, 5(3), 445–463. DOI.
Darbellay, G. A., & Vajda, I. (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45, 1315–1321. DOI.
de Campos, L. M.(2006) A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests. Journal of Machine Learning Research, 7, 2149–2187.
Doucet, A., Jacob, P. E., & Rubenthaler, S. (2013) Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models. arXiv:1304.5768 [Stat].
Embrechts, P., Lindskog, F., & McNeil, A. J.(2003) Modelling dependence with copulas and applications to risk management. Handbook of Heavy Tailed Distributions in Finance, 8(329–384), 1.
Gao, S., Ver Steeg, G., & Galstyan, A. (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables. In Journal of Machine Learning Research (pp. 277–286).
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J.(2008) A Kernel Statistical Test of Independence. In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference. Cambridge, MA: MIT Press
Hall, P. (1987) On Kullback-Leibler Loss and Density Estimation. The Annals of Statistics, 15(4), 1491–1519. DOI.
Harremoës, P., & Tusnády, G. (2012) Information Divergence is more chi squared distributed than the chi squared statistics. arXiv:1202.1125 [Cs, Math, Stat].
Hasminskii, R., & Ibragimov, I. (1990) On Density Estimation in the View of Kolmogorov’s Ideas in Approximation Theory. The Annals of Statistics, 18(3), 999–1010. DOI.
Hausser, J., & Strimmer, K. (2009) Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks. Journal of Machine Learning Research, 10, 1469.
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2015) Minimax Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory, 61(5), 2835–2885. DOI.
Jiao, Jiantao, Venkat, K., Han, Y., & Weissman, T. (2014) Maximum Likelihood Estimation of Functionals of Discrete Distributions. arXiv:1406.6959 [Cs, Math, Stat].
Kandasamy, K., Krishnamurthy, A., Poczos, B., Wasserman, L., & Robins, J. M.(2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations. arXiv:1411.4342 [Stat].
Kozachenko, L., & Leonenko, N. (1987) On statistical estimation of entropy of random vector. Problems in Information Transmiss, 23(2), 95–101.
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004) Estimating mutual information. Physical Review E, 69, 066138. DOI.
Liese, F., & Vajda, I. (2006) On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory, 52(10), 4394–4412. DOI.
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008) A framework for the local information dynamics of distributed computation in complex systems.
Marton, K., & Shields, P. C.(1994) Entropy and the consistent estimation of joint distributions. The Annals of Probability, 22(2), 960–977.
Moon, K. R., & Hero III, A. O.(2014) Multivariate f-Divergence Estimation With Confidence. In NIPS 2014.
Muandet, K., Fukumizu, K., Sriperumbudur, B., & Schölkopf, B. (2016) Kernel Mean Embedding of Distributions: A Review and Beyonds. arXiv:1605.09522 [Cs, Stat].
Nemenman, I., Shafee, F., & Bialek, W. (2001) Entropy and inference, revisited. In arXiv:physics/0108025.
Noshad, M., Moon, K. R., Sekeh, S. Y., & Hero III, A. O.(2017) Direct Estimation of Information Divergence Using Nearest Neighbor Ratios. arXiv:1702.05222 [Cs, Math, Stat].
Paninski, L. (2003) Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. DOI.
Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S.(2007) Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98, 1064–1072. DOI.
Panzeri, S., & Treves, A. (1996) Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7(1), 87–107.
Robinson, P. M.(1991) Consistent Nonparametric Entropy-Based Testing. The Review of Economic Studies, 58(3), 437. DOI.
Roulston, M. S.(1999) Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena, 125(3–4), 285–294. DOI.
Sagara, N. (2005) Nonparametric maximum-likelihood estimation of probability measures: existence and consistency. Journal of Statistical Planning and Inference, 133(2), 249–271. DOI.
Schürmann, T. (2015) A Note on Entropy Estimation. Neural Computation, 27(10), 2097–2106. DOI.
Sejdinovic, D., Sriperumbudur, B., Gretton, A., & Fukumizu, K. (2012) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 41(5), 2263–2291. DOI.
Smola, A., Gretton, A., Song, L., & Schölkopf, B. (2007) A Hilbert Space Embedding for Distributions. In M. Hutter, R. A. Servedio, & E. Takimoto (Eds.), Algorithmic Learning Theory (pp. 13–31). Springer Berlin Heidelberg
Song, L., Huang, J., Smola, A., & Fukumizu, K. (2009) Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 961–968). New York, NY, USA: ACM DOI.
Spirtes, P., & Meek, C. (1995) Learning Bayesian networks with discrete variables from data. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining.
Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Schölkopf, B., & Lanckriet, G. R. G.(2012) On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6, 1550–1599. DOI.
Strobl, E. V., Zhang, K., & Visweswaran, S. (2017) Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery. arXiv:1702.03877 [Stat].
Su, L., & White, H. (2007) A consistent characteristic function-based test for conditional independence. Journal of Econometrics, 141(2), 807–834. DOI.
Székely, G. J., & Rizzo, M. L.(2009) Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265. DOI.
Székely, G. J., Rizzo, M. L., & Bakirov, N. K.(2007) Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794. DOI.
Thanei, G.-A., Shah, N. M., Rajen D., & Shah, R. D.(2016) The xyz algorithm for fast interaction search in high-dimensional data. Arxiv, 20(9), 846–851.
Valiant, P., & Valiant, G. (2013) Estimating the Unseen: Improved Estimators for Entropy and other Properties. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 2157–2165). Curran Associates, Inc.
Victor, J. D.(2002) Binless strategies for estimation of information from neural data. Physical Review E, 66, 051903. DOI.
Wolf, D. R., & Wolpert, D. H.(1994a) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics. arXiv:comp-Gas/9403002.
Wolpert, D. H., & Wolf, D. R.(1994b) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv:comp-Gas/9403001.
Wu, Y., & Yang, P. (2014) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. arXiv:1407.0381 [Cs, Math, Stat].
Yao, S., Zhang, X., & Shao, X. (2016) Testing mutual independence in high dimension via distance covariance. arXiv:1609.09380 [Stat].
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. arXiv:1202.3775 [Cs, Stat].
Zhang, Q., Filippi, S., Gretton, A., & Sejdinovic, D. (2016) Large-Scale Kernel Methods for Independence Testing. arXiv:1606.07892 [Stat].