Say I would like to know the mutual information of the processes generating two streams of observations, with weak assumptions on the form of the generation process. This is a normal sort of empirical probability metric estimation problem

Information is harder than normal, because observations with low frequency have high influence on the estimate. It is easy to get a uselessly biassed – or even inconsistent – estimator, especially in the nonparametric case.

A typical technique, is to construct a joint histogram from your samples, treat the bins as as a finite alphabet and then do the usual calculation. That throws out a lot if information, and it feels clunky and stupid, especially if you suspect your distributions might have some other kind of smoothness that you'd like to exploit.

You cold also estimate the densities. Moreover this method is highly sensitive and can be arbitrarily wrong if you don't do it right (see Paninski, 2003).

So, better alternatives?

One obvious one is asking yourself: Do I really want to know th information? Or do I merely wish to know that something is uninformative, i.e. to estimate some degree of independence? independence is related, but has much more general strategies.

To consider:

*ad hominem*, KKPW14 is a good place to start.- Kraskov's (2004) NN-method looks nice, but don't have any guarantees that I know of
- relationship between mutual information and copula entropy.
- those occasional mentions of calculating mutual information from recurrence plots- how do they work?

## Refs

- Schü15: (2015) A Note on Entropy Estimation.
*Neural Computation*, 27(10), 2097–2106. DOI - Shib97: (1997) Bootstrap estimate of Kullback-Leibler information for model selection.
*Statistica Sinica*, 7, 375–394. - GaVG15: (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables. In Journal of Machine Learning Research (pp. 277–286).
- NeSB01: (2001) Entropy and inference, revisited. In arXiv:physics/0108025.
- HaSt09: (2009) Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks.
*Journal of Machine Learning Research*, 10, 1469. - WoWo94: (1994) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics.
*ArXiv:Comp-Gas/9403002*. - WoWo94: (1994) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy.
*ArXiv:Comp-Gas/9403001*. - KrSG04: (2004) Estimating mutual information.
*Physical Review E*, 69, 066138. DOI - Roul99: (1999) Estimating the errors on measured entropy and mutual information.
*Physica D: Nonlinear Phenomena*, 125(3–4), 285–294. DOI - Pani03: (2003) Estimation of entropy and mutual information.
*Neural Computation*, 15(6), 1191–1253. DOI - Gras88: (1988) Finite sample corrections to entropy and dimension estimates.
*Physics Letters A*, 128(6–7), 369–373. DOI - KKPW14: (2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations.
*ArXiv:1411.4342 [Stat]*. - TaTB07: (2007) Information and fitness.
*Arxiv Preprint ArXiv:0712.4382*. - Akai73: (1973) Information Theory and an Extension of the Maximum Likelihood Principle. In Proceeding of the Second International Symposium on Information Theory (pp. 199–213). Budapest: Akademiai Kiado
- MoHe14: (2014) Multivariate f-Divergence Estimation With Confidence. In NIPS 2014.
- ZhGr14: (2014) Nonparametric Estimation of Küllback-Leibler Divergence.
*Neural Computation*, 26(11), 2570–2593. DOI