The Living Thing / Notebooks :

Estimation of fiddly informationfunctionals of densities

using data to calculate information

Say I would like to know the mutual information of the process generating two streams of observations, with weak assumptions on the form of the generation process. This is a normal sort of probability metric estimation problem.

Information is harder than normal, because observations with low frequency have high influence on the estimate. It is easy to get a uselessly biassed —- or even inconsistent —- estimator, especially in the nonparametric case.

A typical technique, is to construct a joint histogram from your samples, treat the bins as as a finite alphabet and then do the usual calculation. That throws out a lot if information, and it feels clunky and stupid, especially if you suspect your distributions might have some other kind of smoothness that you’d like to exploit.

You cold also estimate the densities. Moreover this method is highly sensitive and can be arbitrarily wrong if you don’t do it right (see Paninski, 2003).

So, better alternatives?

To consider:

To read

BaBo12
Barnett, L., & Bossomaier, T. (2012) Transfer Entropy as a Log-likelihood Ratio. arXiv:1205.6339.
BDGM97
Beirlant, J., Dudewicz, E. J., Györfi, L., & van der Meulen, E. C.(1997) Nonparametric entropy estimation: An overview. Journal of Mathematical and Statistical Sciences, 6(1), 17–39.
ChSh03
Chao, A., & Shen, T.-J. (2003) Nonparametric estimation of Shannon?s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10(4), 429–443. DOI.
DaVa99
Darbellay, G. A., & Vajda, I. (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45, 1315–1321. DOI.
DaWu00
Darbellay, G. A., & Wuertz, D. (2000) The entropy as a tool for analysing statistical dependences in financial time series. Physica A: Statistical Mechanics and Its Applications, 287(3?4), 429–439. DOI.
DSSK04
Daub, C. O., Steuer, R., Selbig, J., & Kloska, S. (2004) Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics, 5(1), 118. DOI.
DoJR13
Doucet, A., Jacob, P. E., & Rubenthaler, S. (2013) Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models. arXiv:1304.5768 [Stat].
GaVG00
Gao, S., Ver Steeg, G., & Galstyan, A. (n.d.) Estimating Mutual Information by Local Gaussian Approximation.
HaSt09
Hausser, J., & Strimmer, K. (2009) Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks. Journal of Machine Learning Research, 10, 1469.
JVHW14
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2014) Maximum Likelihood Estimation of Functionals of Discrete Distributions. arXiv:1406.6959 [Cs, Math, Stat].
JVHW15
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2015) Minimax Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory, 61(5), 2835–2885. DOI.
KKPW14
Kandasamy, K., Krishnamurthy, A., Poczos, B., Wasserman, L., & Robins, J. M.(2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations. arXiv:1411.4342 [Stat].
KSAC05
Kennel, M. B., Shlens, J., Abarbanel, H. D. I., & Chichilnisky, E. J.(2005) Estimating Entropy Rates with Bayesian Confidence Intervals. Neural Computation, 17(7). DOI.
KrSG04
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004) Estimating mutual information. Physical Review E, 69, 66138. DOI.
LiVa06
Liese, F., & Vajda, I. (2006) On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory, 52(10), 4394–4412. DOI.
LiPZ08
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008) A framework for the local information dynamics of distributed computation in complex systems.
MaSh94
Marton, K., & Shields, P. C.(1994) Entropy and the consistent estimation of joint distributions. The Annals of Probability, 22(2), 960–977.
MoRL95
Moon, Y. I., Rajagopalan, B., & Lall, U. (1995) Estimation of mutual information using kernel density estimators. Physical Review E, 52, 2318–2321. DOI.
NeBR04
Nemenman, I., Bialek, W., & de Ruyter Van Steveninck, R. (2004) Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E, 69(5), 56111.
NeSB02
Nemenman, I., Shafee, F., & Bialek, W. (2002) Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14 (Vol. 14). Cambridge, MA, USA: The MIT Press
Pani03
Paninski, L. (2003) Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. DOI.
PSMP07
Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S.(2007) Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98, 1064–1072. DOI.
PaTr96
Panzeri, S., & Treves, A. (1996) Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7(1), 87–107.
Robi91
Robinson, P. M.(1991) Consistent Nonparametric Entropy-Based Testing. The Review of Economic Studies, 58(3), 437. DOI.
Roul99
Roulston, M. S.(1999) Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena, 125(3–4), 285–294. DOI.
Schü15
Schürmann, T. (2015) A Note on Entropy Estimation. Neural Computation, 27(10), 2097–2106. DOI.
StLe08
Staniek, M., & Lehnertz, K. (2008) Symbolic transfer entropy. Physical Review Letters, 100(15), 158101. DOI.
VePa08
Vejmelka, M., & Paluš, M. (2008) Inferring the directionality of coupling with conditional mutual information. Phys. Rev. E, 77(2), 26214. DOI.
Vict02
Victor, J. D.(2002) Binless strategies for estimation of information from neural data. Physical Review E, 66, 51903. DOI.
WoWo94a
Wolf, D. R., & Wolpert, D. H.(1994a) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics. arXiv:comp-gas/9403002.
WoWo94b
Wolpert, D. H., & Wolf, D. R.(1994b) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv:comp-gas/9403001.
WuYa14
Wu, Y., & Yang, P. (2014) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. arXiv:1407.0381 [Cs, Math, Stat].