The Living Thing / Notebooks :

Probability metrics, estimation thereof from data

Independence tests, quantifications of statistical nearness.

Refs

BaSS04
Baba, K., Shibata, R., & Sibuya, M. (2004) Partial Correlation and Conditional Correlation as Measures of Conditional Independence. Australian & New Zealand Journal of Statistics, 46(4), 657–664. DOI.
BDGM97
Beirlant, J., Dudewicz, E. J., Györfi, L., & van der Meulen, E. C.(1997) Nonparametric entropy estimation: An overview. Journal of Mathematical and Statistical Sciences, 6(1), 17–39.
Bera77
Beran, R. (1977) Minimum Hellinger Distance Estimates for Parametric Models. The Annals of Statistics, 5(3), 445–463. DOI.
DaVa99
Darbellay, G. A., & Vajda, I. (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45, 1315–1321. DOI.
Camp06
de Campos, L. M.(2006) A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests. Journal of Machine Learning Research, 7, 2149–2187.
DoJR13
Doucet, A., Jacob, P. E., & Rubenthaler, S. (2013) Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models. arXiv:1304.5768 [Stat].
EmLM03
Embrechts, P., Lindskog, F., & McNeil, A. J.(2003) Modelling dependence with copulas and applications to risk management. Handbook of Heavy Tailed Distributions in Finance, 8(329–384), 1.
GaVG15
Gao, S., Ver Steeg, G., & Galstyan, A. (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables. In Journal of Machine Learning Research (pp. 277–286).
GFTS08
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J.(2008) A Kernel Statistical Test of Independence. In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference. Cambridge, MA: MIT Press
Hall87
Hall, P. (1987) On Kullback-Leibler Loss and Density Estimation. The Annals of Statistics, 15(4), 1491–1519. DOI.
HaTu12
Harremoës, P., & Tusnády, G. (2012) Information Divergence is more chi squared distributed than the chi squared statistics. arXiv:1202.1125 [Cs, Math, Stat].
HaIb90
Hasminskii, R., & Ibragimov, I. (1990) On Density Estimation in the View of Kolmogorov’s Ideas in Approximation Theory. The Annals of Statistics, 18(3), 999–1010. DOI.
HaSt09
Hausser, J., & Strimmer, K. (2009) Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks. Journal of Machine Learning Research, 10, 1469.
JVHW15
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2015) Minimax Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory, 61(5), 2835–2885. DOI.
JVHW14
Jiao, Jiantao, Venkat, K., Han, Y., & Weissman, T. (2014) Maximum Likelihood Estimation of Functionals of Discrete Distributions. arXiv:1406.6959 [Cs, Math, Stat].
KKPW14
Kandasamy, K., Krishnamurthy, A., Poczos, B., Wasserman, L., & Robins, J. M.(2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations. arXiv:1411.4342 [Stat].
KoLe87
Kozachenko, L., & Leonenko, N. (1987) On statistical estimation of entropy of random vector. Problems in Information Transmiss, 23(2), 95–101.
KrSG04
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004) Estimating mutual information. Physical Review E, 69, 066138. DOI.
LiVa06
Liese, F., & Vajda, I. (2006) On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory, 52(10), 4394–4412. DOI.
LiPZ08
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008) A framework for the local information dynamics of distributed computation in complex systems.
MaSh94
Marton, K., & Shields, P. C.(1994) Entropy and the consistent estimation of joint distributions. The Annals of Probability, 22(2), 960–977.
MoHe14
Moon, K. R., & Hero III, A. O.(2014) Multivariate f-Divergence Estimation With Confidence. In NIPS 2014.
MFSS16
Muandet, K., Fukumizu, K., Sriperumbudur, B., & Schölkopf, B. (2016) Kernel Mean Embedding of Distributions: A Review and Beyonds. arXiv:1605.09522 [Cs, Stat].
NeSB01
Nemenman, I., Shafee, F., & Bialek, W. (2001) Entropy and inference, revisited. In arXiv:physics/0108025.
NMSH17
Noshad, M., Moon, K. R., Sekeh, S. Y., & Hero III, A. O.(2017) Direct Estimation of Information Divergence Using Nearest Neighbor Ratios. arXiv:1702.05222 [Cs, Math, Stat].
Pani03
Paninski, L. (2003) Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. DOI.
PSMP07
Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S.(2007) Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98, 1064–1072. DOI.
PaTr96
Panzeri, S., & Treves, A. (1996) Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7(1), 87–107.
Robi91
Robinson, P. M.(1991) Consistent Nonparametric Entropy-Based Testing. The Review of Economic Studies, 58(3), 437. DOI.
Roul99
Roulston, M. S.(1999) Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena, 125(3–4), 285–294. DOI.
Saga05
Sagara, N. (2005) Nonparametric maximum-likelihood estimation of probability measures: existence and consistency. Journal of Statistical Planning and Inference, 133(2), 249–271. DOI.
Schü15
Schürmann, T. (2015) A Note on Entropy Estimation. Neural Computation, 27(10), 2097–2106. DOI.
SSGF12
Sejdinovic, D., Sriperumbudur, B., Gretton, A., & Fukumizu, K. (2012) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 41(5), 2263–2291. DOI.
SGSS07
Smola, A., Gretton, A., Song, L., & Schölkopf, B. (2007) A Hilbert Space Embedding for Distributions. In M. Hutter, R. A. Servedio, & E. Takimoto (Eds.), Algorithmic Learning Theory (pp. 13–31). Springer Berlin Heidelberg
SHSF09
Song, L., Huang, J., Smola, A., & Fukumizu, K. (2009) Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 961–968). New York, NY, USA: ACM DOI.
SpMe95
Spirtes, P., & Meek, C. (1995) Learning Bayesian networks with discrete variables from data. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining.
SFGS12
Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Schölkopf, B., & Lanckriet, G. R. G.(2012) On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6, 1550–1599. DOI.
StZV17
Strobl, E. V., Zhang, K., & Visweswaran, S. (2017) Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery. arXiv:1702.03877 [Stat].
SuWh07
Su, L., & White, H. (2007) A consistent characteristic function-based test for conditional independence. Journal of Econometrics, 141(2), 807–834. DOI.
SzRi09
Székely, G. J., & Rizzo, M. L.(2009) Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265. DOI.
SzRB07
Székely, G. J., Rizzo, M. L., & Bakirov, N. K.(2007) Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794. DOI.
ThSS16
Thanei, G.-A., Shah, N. M., Rajen D., & Shah, R. D.(2016) The xyz algorithm for fast interaction search in high-dimensional data. Arxiv, 20(9), 846–851.
VaVa13
Valiant, P., & Valiant, G. (2013) Estimating the Unseen: Improved Estimators for Entropy and other Properties. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 2157–2165). Curran Associates, Inc.
Vict02
Victor, J. D.(2002) Binless strategies for estimation of information from neural data. Physical Review E, 66, 051903. DOI.
WoWo94a
Wolf, D. R., & Wolpert, D. H.(1994a) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics. arXiv:comp-Gas/9403002.
WoWo94b
Wolpert, D. H., & Wolf, D. R.(1994b) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv:comp-Gas/9403001.
WuYa14
Wu, Y., & Yang, P. (2014) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. arXiv:1407.0381 [Cs, Math, Stat].
YaZS16
Yao, S., Zhang, X., & Shao, X. (2016) Testing mutual independence in high dimension via distance covariance. arXiv:1609.09380 [Stat].
ZPJS12
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. arXiv:1202.3775 [Cs, Stat].
ZFGS16
Zhang, Q., Filippi, S., Gretton, A., & Sejdinovic, D. (2016) Large-Scale Kernel Methods for Independence Testing. arXiv:1606.07892 [Stat].