# The Living Thing / Notebooks : Informations

Not: what you hope to get from the newspaper. Rather: Different types of (formally defined) entropy/information and their disambiguations.

The seductive power of the logarithm. Information criteria.

A proven path to publication is to find or reinvent a derived measure based on Shannon information, and apply it to something provocative-sounding. (Qualia! Stock markets! Evolution! Language! The qualia of evolving stock market languages!)

## Shannon Information

Vanilla information, thanks be to Claude Shannon. You have are given a discrete random process of specificed parameterisation. How much can you compress it down to a more parsimonious process? (leaving coding theory aside for the moment.)

Given a random variable $X$ taking values $x \in \mathcal{X}$ from some discrete alphabet $\mathcal{X}$, with probability mass function $p(x)$.

\begin{equation*} \begin{array}{ccc} H(x) & := & -\sum_{x \in \mathcal{X}} p(x) \log p(x) \\ & \equiv & E( \log 1/p(x) ) \end{array} \end{equation*}

## K-L divergence

Because “Kullback-Leibler divergence” is a lot of syllables for something you use so often, even if usually in sentences like “unlike the K-L divergences”. Or you could call it the “relative entropy”, but that sounds like something to do with my uncle after the seventh round of christmas drinks.

It is defined between the probability mass functions of two discrete random variables, $P,Q$, where those probability mass functions are given $p(x)$ and $q(x)$ respectively.

\begin{equation*} \begin{array}{cccc} D(P \parallel Q) & := & -\sum_{x \in \mathcal{X}} p(x) \log p(x) \frac{p(x)}{q(x)} \\ & \equiv & E \log p(x) \frac{p(x)}{q(x)} \end{array} \end{equation*}

## Mutual information

The “informativeness” of one variable given another… Most simply, the K-L divergence between the product distribution and the joint distribution of two random variables. (That is, it vanishes if the two variables are independent).

Now, take $X$ and $Y$ with joint probability mass distribution $p_{XY}(x,y)$ and, for clarity, marginal distributions $p_X$ and $p_Y$.

Then the mutual information :math:I is given

\begin{equation*} I(X; Y) = H(X) - X(X|Y) \end{equation*}

Estimating this one has been giving me grief lately, so I’ll be happy when I get to this section and solve it forever. See nonparametric mutual information.

Getting an intuition of what this measure does is handy, so I’ll expound some equivalent definitions that emphasis different characteristics:

\begin{equation*} \begin{array}{cccc} I(X; Y) & := & \sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p_{XY}(x, y) \log p(x, y) \frac{p_{XY}(x,y)}{p_X(x)p_Y(y)} \\ & = & D( p_{XY} \parallel p_X p_Y) \\ & = & E \log \frac{p_{XY}(x,y)}{p_X(x)p_Y(y)} \end{array} \end{equation*}

## Kolmogorov-Sinai entropy

Schreiber says:

If $I$ is obtained by coarse graining a continuous system $X$ at resolution $\epsilon$, the entropy $HX(\epsilon)$ and entropy rate $hX(\epsilon)$ will depend on the partitioning and in general diverge like $\log(\epsilon)$ when $\epsilon \to 0$. However, for the special case of a deterministic dynamical system, $lim_{\epsilon\to 0} hX (\epsilon) = hKS$ may exist and is then called the Kolmogorov-Sinai entropy. (For non-Markov systems, also the limit $k \to \infty$ needs to be taken.)

That is, it is a special case of the entropy rate for a dynamical system. - Cue connection to algorithmic complexity

## Rényi Information

Also, the Hartley measure.

You don’t need to use a logarithm in your information summation. Free energy, something something. (?)

The observation that many of the attractive features of information measures are simply due to the concavity of the logarithm term in the function. So, why not whack another concave function with even more handy features in there? Bam, you are now working on Rényi information. How do you feel?

## Tsallis Statistics

Attempting to make information measures “non-extensive”. q*-entropy. Seems to have made a big splash in Brazil, but less in other countries. There are good books about highly localised mathematical oddities in that nation, but not ones I’d cite in academic articles. Non-extensive measures are an intriguing idea, though. I wonder if it’s parochialism that keeps everyone off Tsallis statistics, or a lack of demonstrated use?

## Fisher information

### Estimating information

Wait, you don’t know the informativeness of the sources a priori? You need to estimate it from data.

### Refs

Adami, C. (2004) Information theory in molecular biology. Physics of Life Reviews, 1(1), 3–22. DOI.
Arno96
Arnold, D. V.(1996) Information-theoretic analysis of phase transitions. Complex Systems, 10(2), 143–156.
Ay01
Ay, N. (2001) Information geometry on complexity and stochastic interaction. . Presented at the MPI MIS PREPRINT 95
ABDG08
Ay, N., Bertschinger, N., Der, R., Güttler, F., & Olbrich, E. (2008) Predictive information and explorative behavior of autonomous robots. The European Physical Journal B - Condensed Matter and Complex Systems, 63(3), 329–339. DOI.
AyCr04
Ay, N., & Crutchfield, J. P.(2004) Reductions of Hidden Information Sources.
AyPo08
Ay, N., & Polani, D. (2008) Information flows in causal networks. Advances in Complex Systems (ACS), 11(1), 17–41. DOI.
BaBo12
Barnett, L., & Bossomaier, T. (2012) Transfer Entropy as a Log-likelihood Ratio. arXiv:1205.6339.
BaBS10
Barrett, A. B., Barnett, L., & Seth, A. K.(2010) Multivariate Granger causality and generalized variance. Phys. Rev. E, 81(4), 41907. DOI.
BMMS12
Batty, M., Morphet, R., Masucci, P., & Stanilov, K. (2012) Entropy, Complexity and Spatial Information.
Bell03
Bell, A. J.(2003) The co-information lattice. . Presented at the Proceedings of the Fifth International Workshop on Independent Component Analysis and Blind Signal Separation: ICA 2003
BeLa04
Bergstrom, C. T., & Lachmann, M. (2004) Shannon information and biological fitness. (pp. 50–54). Presented at the Information Theory Workshop, 2004. IEEE DOI.
BlPL11
Blanc, J.-L., Pezard, L., & Lesne, A. (2011) Delay independence of mutual-information rate of two symbolic sequences. Phys. Rev. E, 84(3), 36214. DOI.
BlKK04
Blinowska, K. J., Kuś, R., & Kamiński, M. (2004) Granger causality and information flow in multivariate processes. Physical Review E, 70(5), 50902. DOI.
Bran99
Brand, M. (1999) An entropic estimator for structure discovery. In Advances in Neural Information Processing Systems (pp. 723–729). MIT Press
CaCa98
Calera-Rubio, J., & Carrasco, R. C.(1998) Computing the relative entropy between regular tree languages. Information Processing Letters, 68(6), 283–289. DOI.
CaVi09
Calsaverini, R. S., & Vicente, R. (2009) An information-theoretic approach to statistical dependence: Copula information. EPL (Europhysics Letters), 88(6), 68003. DOI.
CeLZ11
Ceguerra, R. V., Lizier, J. T., & Zomaya, A. Y.(2011) Information storage and transfer in the synchronization process in locally-connected networks. . Presented at the IEEE Symposium Series in Computational Intelligence (SSCI 2011) - IEEE Symposium on Artificial Life,
CeAR05
Cellucci, C. J., Albano, A. M., & Rapp, P. E.(2005) Statistical validation of mutual information calculations: Comparison of alternative numerical algorithms. Physical Review E, 71(6), 66208. DOI.
Chai77
Chaitin, G. J.(1977) Algorithmic information theory. IBM Journal of Research and Development.
CZBS07
Chanda, P., Zhang, A., Brazeau, D., Sucheston, L., Freudenheim, J. L., Ambrosone, C., & Ramanathan, M. (2007) Information-Theoretic Metrics for Visualizing Gene-Environment Interactions. American Journal of Human Genetics, 81(5), 939.
ChTi00
Chechik, G., & Tishby, N. (n.d.) Extracting relevant structures with side information.
ChDP10
Chiribella, G., D’Ariano, G. M., & Perinotti, P. (2010) Informational derivation of Quantum Theory.
ChLi68
Chow, C. K., & Liu, C. N.(1968) Approximating discrete probability distributions with dependence trees. Information Theory, IEEE Transactions on, 14, 462–467. DOI.
Cien00
Ciencias del Espacio, I. (2000) Measuring mutual information in random Boolean networks. Complex Systems, 12, 241–252.
CoDR06
Coeurjolly, J.-F., Drouilhet, R., & Robineau, J.-F. (2006) Normalized information-based divergences. arXiv:math/0604246.
Cohe62
Cohen, J. E.(1962) Information theory and music. Behavioral Science, 7(2), 137–163. DOI.
CoGG89
Cover, T. M., Gács, P., & Gray, R. M.(1989) Kolmogorov’s Contributions to Information Theory and Algorithmic Complexity. The Annals of Probability, 17(3), 840–865. DOI.
CoTh06
Cover, T. M., & Thomas, J. A.(2006) Elements of Information Theory. . Wiley-Interscience
CEJM00
Crutchfield, J. P., Ellison, C. J., James, R. G., & Mahoney, J. R.(n.d.) Synchronization and control in intrinsic and designed computation: An information-theoretic analysis of competing models of stochastic computation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 20(3), 37105. DOI.
CrFe01
Crutchfield, J. P., & Feldman, D. P.(2001) Synchronizing to the Environment: Information Theoretic Constraints on Agent Learning.
CsSh04
Csiszár, I., & Shields, P. C.(2004) Information theory and statistics: a tutorial. Foundations and Trends™ in Communications and Information Theory, 1(4), 417–528. DOI.
Dahl96
Dahlhaus, R. (1996) On the Kullback-Leibler information divergence of locally stationary processes. Stochastic Processes and Their Applications, 62(1), 139–168. DOI.
DaVa99
Darbellay, G. A., & Vajda, I. (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45, 1315–1321. DOI.
DSSK04
Daub, C. O., Steuer, R., Selbig, J., & Kloska, S. (2004) Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics, 5(1), 118. DOI.
Davi02
Davis, A. G.(2002) Small sample effects on information-theoretic estimates.
DeMo14
Dehmer, M., & Mowshowitz, A. (2014) A case study of cracks in the scientific enterprise: Reinvention of information-theoretic measures for graphs. Complexity, n/a-n/a. DOI.
Dewa03
Dewar, R. C.(2003) Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states. Journal of Physics A: Mathematical and General, 36, 631–641. DOI.
DKFR13
Dupuis, F., Kraemer, L., Faist, P., Renes, J. M., & Renner, R. (2013) Generalized Entropies. In XVIIth International Congress on Mathematical Physics (pp. 134–153). WORLD SCIENTIFIC DOI.
Eich00
Eichler, M. (n.d.) Granger-causality graphs for multivariate time series. Granger-Causality Graphs for Multivariate Time Series.
ElFr05
Elidan, G., & Friedman, N. (2005) Learning Hidden Variable Networks: The Information Bottleneck Approach. J. Mach. Learn. Res., 6, 81–127.
EnFö05
Endres, D., & Földiák, P. (2005) Bayesian Bin Distribution Inference and Mutual Information. IEEE Transactions on Information Theory, 51, 3766–3779. DOI.
EpMe02
Ephraim, Y., & Merhav, N. (2002) Hidden Markov processes. Information Theory, IEEE Transactions on, 48(6), 1518–1569. DOI.
ErAy04
Erb, I., & Ay, N. (2004) Multi-information in the thermodynamic limit. Journal of Statistical Physics, 115(3), 949–976. DOI.
Feld97
Feldman, D. P.(1997) A brief introduction to: Information theory, excess entropy and computational mechanics. Department of Physics, University of California, July.
FeCr98
Feldman, D. P., & Crutchfield, J. P.(1998) Discovering Noncritical Organization: Statistical Mechanical, Information Theoretic, and Computational Views of Patterns in One Dimensional Spin Systems.
FeCr03
Feldman, D. P., & Crutchfield, J. P.(2003) Structural information in two-dimensional patterns: Entropy convergence and excess entropy. Physical Review E, 67(5), 51104. DOI.
FeCr04
Feldman, D. P., & Crutchfield, J. P.(2004) Synchronizing to Periodicity: the Transient Information and Synchronization Time of Periodic Sequences. Advances in Complex Systems, 7(3), 329–355. DOI.
FeMC08
Feldman, D. P., McTague, C. S., & Crutchfield, J. P.(2008) The organization of intrinsic computation: Complexity-entropy diagrams and the diversity of natural information processing. Chaos: An Interdisciplinary Journal of Nonlinear Science, 18, 43106. DOI.
FMST01
Friedman, N., Mosenzon, O., Slonim, N., & Tishby, N. (2001) Multivariate information bottleneck. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (pp. 152–161). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
GáTV01
Gács, P., Tromp, J., & Vitányi, P. M. B.(2001) Algorithmic statistics. IEEE Transactions on Information Theory, 47(6), 2443–2463. DOI.
GaVG15
Gao, S., Ver Steeg, G., & Galstyan, A. (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables. (pp. 277–286). Presented at the Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
GaVG00
Gao, S., Ver Steeg, G., & Galstyan, A. (n.d.) Estimating Mutual Information by Local Gaussian Approximation.
Goya12
Goyal, P. (2012) Information Physics—Towards a New Conception of Physical Reality. Information, 3(4), 567–594. DOI.
Gran63
Granger, C. W. J.(1963) Economic processes involving feedback. Information and Control, 6(1), 28–48. DOI.
Gras88
Grassberger, P. (1988) Finite sample corrections to entropy and dimension estimates. Physics Letters A, 128(6–7), 369–373. DOI.
GrKi80
Gray, R., & Kieffer, J. (1980) Mutual information rate, distortion, and quantization in metric spaces. Information Theory, IEEE Transactions on, 26(4), 412–422. DOI.
Gray91
Gray, R. M.(1991) Entropy and Information Theory. . New York: Springer-Verlag
Hart28
Hartley, R. V. L.(1928) Transmission of Information.
HaOp97
Haussler, D., & Opper, M. (1997) Mutual information, metric entropy and cumulative relative entropy risk. The Annals of Statistics, 25(6), 2451–2492. DOI.
HPVB07
Hlaváčková-Schindler, K., Paluš, M., Vejmelka, M., & Bhattacharya, J. (2007) Causality detection based on information-theoretic approaches in time series analysis. Physics Reports, 441(1), 1–46. DOI.
HTKG08
Hnizdo, V., Tan, J., Killian, B. J., & Gilson, M. K.(2008) Efficient Calculation of Configurational Entropy from Molecular Simulations by Combining the Mutual-Information Expansion and Nearest-Neighbor Methods. Journal of Computational Chemistry, 29(10), 1605. DOI.
Hutt01
Hutter, M. (2001) Distribution of Mutual Information. arXiv:cs/0112019.
Jayn63
Jaynes, E. T.(1963) Information Theory and Statistical Mechanics. In Statistical Physics (Vol. 3).
JVHW14
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2014) Maximum Likelihood Estimation of Functionals of Discrete Distributions. arXiv:1406.6959 [Cs, Math, Stat].
JVHW15
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2015) Minimax Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory, 61(5), 2835–2885. DOI.
KaSc02
Kaiser, A., & Schreiber, T. (2002) Information transfer in continuous processes. Physica D: Nonlinear Phenomena, 166(1–2), 43–62. DOI.
KKPW14
Kandasamy, K., Krishnamurthy, A., Poczos, B., Wasserman, L., & Robins, J. M.(2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations. arXiv:1411.4342 [Stat].
Kell56
Kelly Jr, J. L.(1956) A new interpretation of information rate. Bell System Technical Journal, 35(3), 917–926.
Klir06
Klir, G. J.(2006) Uncertainty and information. . Wiley Online Library
KoMa12
Kontoyiannis, I., & Madiman, M. (2012) Sumset inequalities for differential entropy and mutual information. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on (pp. 1261–1265). DOI.
KrSG04
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004) Estimating mutual information. Physical Review E, 69, 66138. DOI.
Kuic70
Kuich, W. (1970) On the entropy of context-free languages. Information and Control, 16(2), 173–200. DOI.
Lane00
Laneman, J. N.(n.d.) On the Distribution of Mutual Information.
Leon08
Leonenko, N. (2008) A class of Rényi information estimators for multidimensional densities. The Annals of Statistics, 36(5), 2153–2182. DOI.
Lesk12
Leskovec, J. (2012) Information Diffusion and External Influence in Networks. Eprint arXiv:1206.1331.
LiVa06
Liese, F., & Vajda, I. (2006) On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory, 52(10), 4394–4412. DOI.
Lin91
Lin, J. (1991) Divergence measures based on the Shannon entropy. Information Theory, IEEE Transactions on, 37(1), 145–151. DOI.
LiPP11
Lizier, J. T., Pritam, S., & Prokopenko, M. (2011) Information Dynamics in Small-World Boolean Networks. Artificial Life. DOI.
LiPr10
Lizier, J. T., & Prokopenko, M. (2010) Differentiating information transfer and causal effect. The European Physical Journal B - Condensed Matter and Complex Systems, 73(4), 605–615. DOI.
LiPZ08a
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008a) A framework for the local information dynamics of distributed computation in complex systems.
LiPZ08b
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008b) Local information transfer as a spatiotemporal filter for complex systems. Physical Review E, 77, 26110. DOI.
LiPZ08c
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008c) The Information Dynamics of Phase Transitions in Random Boolean Networks. In Artificial Life (Vol. 11, p. 374).
Mart15
Marton, K. (2015) Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance. arXiv:1507.02803 [Math].
Mats00
Matsuda, H. (2000) Physical nature of higher-order mutual information: Intrinsic correlations and frustration. Physical Review E, 62(3), 3096–3102. DOI.
MKNY96
Matsuda, H., Kudo, K., Nakamura, R., Yamakawa, O., & Murata, T. (1996) Mutual information of Ising systems. International Journal of Theoretical Physics, 35(4), 839–845.
Mayn00
Maynard Smith, J. (2000) The Concept of Information in Biology. Philosophy of Science, 67(2), 177–194.
Mcgi54
McGill, W. J.(1954) Multivariate information transmission. Information Theory, IRE Professional Group on, 4(4), 93–111. DOI.
MiLi02
Miller, D. J., & Liu, W. (2002) On the recovery of joint distributions from limited information. Journal of Econometrics, 107(1–2), 259–274. DOI.
MoHe14
Moon, K. R., & Hero III, A. O.(2014) Multivariate f-Divergence Estimation With Confidence. In NIPS 2014.
MoRL95
Moon, Y. I., Rajagopalan, B., & Lall, U. (1995) Estimation of mutual information using kernel density estimators. Physical Review E, 52, 2318–2321. DOI.
NeBR04
Nemenman, I., Bialek, W., & de Ruyter Van Steveninck, R. (2004) Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E, 69(5), 56111.
NeSB02
Nemenman, I., Shafee, F., & Bialek, W. (2002) Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14 (Vol. 14). Cambridge, MA, USA: The MIT Press
Nich05
Nichols, J. M.(2005) Inferences about information flow and dispersal for spatially extended population systems using time-series data. Proceedings of the Royal Society of London. Series B: Biological Sciences, 272(1565), 871–876. DOI.
NPLA08
Nykter, M., Price, N. D., Larjo, A., Aho, T., Kauffman, S. A., Yli-Harja, O., & Shmulevich, I. (2008) Critical Networks Exhibit Maximal Information Diversity in Structure-Dynamics Relationships.
PaVe08
Palomar, D. P., & Verdu, S. (2008) Lautum Information. IEEE Transactions on Information Theory, 54(3), 964–975. DOI.
PKHV01
Paluš, M., Komárek, V., Hrnčí vr, Z. vek, & vSt verbová, K. (2001) Synchronization as adjustment of information rates: Detection from bivariate time series. Phys. Rev. E, 63(4), 46211. DOI.
Pani03
Paninski, L. (2003) Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. DOI.
PSMP07
Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S.(2007) Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98, 1064–1072. DOI.
PaTr96
Panzeri, S., & Treves, A. (1996) Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7(1), 87–107.
Pere00
Pereira, F. (2000) Formal grammar and information theory: together again?. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1769), 1239–1253. DOI.
PfGr05
Pflieger, M. E., & Greenblatt, R. E.(2005) Using conditional mutual information to approximate causality for multivariate physiological time series. Int. J. Bioelectromagnetism, 7, 285–288.
PhTT14
Phat, V. N., Thanh, N. T., & Trinh, H. (2014) Full-Order observer design for nonlinear complex large-scale systems with unknown time-varying delayed interactions. Complexity, n/a-n/a. DOI.
Pink56
Pinkerton, R. C.(1956) Information theory and melody. Scientific American, 194(2), 77–86. DOI.
PlNo00
Plotkin, J. B., & Nowak, M. A.(2000) Language Evolution and Information Theory. Journal of Theoretical Biology, 205, 147–159. DOI.
Pola11
Polani, D. (2011) An informational perspective on how the embodiment can relieve cognitive burden. (pp. 78–85). Presented at the Artificial Life (ALIFE), 2011 IEEE Symposium on DOI - DOI.
PrBR09
Prokopenko, M., Boschetti, F., & Ryan, A. J.(2009) An information-theoretic primer on complexity, self-organization, and emergence. Complexity, 15(1), 11–28. DOI.
PLOW11
Prokopenko, M., Lizier, J. T., Obst, O., & Wang, X. R.(2011) Relating Fisher information to order parameters. Phys. Rev. E, 84(4), 41116. DOI.
Ragi11
Raginsky, M. (2011) Directed information and Pearl’s causal calculus. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 958–965). DOI.
RaSa12
Raginsky, M., & Sason, I. (2012) Concentration of Measure Inequalities in Information Theory, Communications and Coding. arXiv:1212.4663 [Cs, Math].
RKLS08
Ribeiro, A. S., Kauffman, S. A., Lloyd-Price, J., Samuelsson, B., & Socolar, J. E. S.(2008) Mutual information in random Boolean models of regulatory networks. Phys. Rev. E, 77(1), 11901. DOI.
Riss07
Rissanen, J. (2007) Information and complexity in statistical modeling. . New York: Springer
Robe05
Roberts, A. J.(2005) Use the information dimension, not the Hausdorff.
Roul99
Roulston, M. S.(1999) Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena, 125(3–4), 285–294. DOI.
RyRy10
Ryabko, D., & Ryabko, B. (2010) Nonparametric Statistical Inference for Ergodic Processes. IEEE Transactions on Information Theory, 56(3), 1430–1435. DOI.
Schn05
Schneider, T. D.(2005) Information Theory Primer With an Appendix on Logarithms.
Schr00
Schreiber, T. (2000) Measuring information transfer. Physical Review Letters, 85(2), 461–464.
SHOE11
Schuch, N., Harrison, S. K., Osborne, T. J., & Eisert, J. (2011) Information propagation for interacting-particle systems. Phys. Rev. A, 84(3), 32309. DOI.
Schü15
Schürmann, T. (2015) A Note on Entropy Estimation. Neural Computation, 27(10), 2097–2106. DOI.
SePr00
Seth, S., & Príncipe, J. C.(n.d.) Estimation of conditional mutual information and its application as a measure of conditional dependence.
ShCr02
Shalizi, C. R., & Crutchfield, J. P.(2002) Information Bottlenecks, Causal States, And Statistical Relevance Bases: How To Represent Relevant Information In Memoryless Transduction. Advances in Complex Systems (ACS), 5(1), 91–95.
Shel00
Shelah, S. (2000) Choiceless Polynomial Time Logic: Inability to Express. In P. G. Clote & H. Schwichtenberg (Eds.), Computer Science Logic (pp. 72–125). Springer Berlin Heidelberg
Shib97
Shibata, R. (1997) Bootstrap estimate of Kullback-Leibler information for model selection. Statistica Sinica, 7, 375–394.
Shie98
Shields, P. C.(1998) The interactions between ergodic theory and information theory. Information Theory, IEEE Transactions on, 44(6), 2079–2093. DOI.
SKAC07
Shlens, J., Kennel, M. B., Abarbanel, H. D. I., & Chichilnisky, E. J.(2007) Estimating Information Rates with Confidence Intervals in Neural Spike Trains. Neural Computation. DOI.
SATB05a
Slonim, N., Atwal, G. S., Tkačik, G., & Bialek, W. (2005a) Estimating mutual information and multi-information in large networks.
SATB05b
Slonim, N., Atwal, G. S., Tkačik, G., & Bialek, W. (2005b) Information-based clustering. Proceedings of the National Academy of Sciences of the United States of America, 102, 18297–18302. DOI.
SlFT06
Slonim, N., Friedman, N., & Tishby, N. (2006) Multivariate information bottleneck. Neural Computation, 18(8), 1739–1789. DOI.
SlTi00
Slonim, N., & Tishby, N. (2000) Agglomerative information bottleneck. Advances in Neural Information Processing Systems, 12, 617–623.
Srin00
Srinivasa, S. (n.d.) A Review on Multivariate Mutual Information.
StGa15
Steeg, G. V., & Galstyan, A. (2015) The Information Sieve. arXiv:1507.02284 [Cs, Math, Stat].
SKDW02
Steuer, R., Kurths, J., Daub, C. O., Weise, J., & Selbig, J. (2002) The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics, 18(suppl 2), 231. DOI.
SKRB98
Strong, S. P., Koberle, R., de Ruyter van Steveninck, R. R., & Bialek, W. (1998) Entropy and Information in Neural Spike Trains. Phys. Rev. Lett., 80(1), 197–200. DOI.
StVe98
Studený, M., & Vejnarová, J. (1998) On multiinformation function as a tool for measuring stochastic dependence. In Learning in graphical models (pp. 261–297). Cambridge, Mass.: MIT Press
TaTB07
Taylor, S. F., Tishby, N., & Bialek, W. (2007) Information and fitness. Arxiv Preprint arXiv:0712.4382.
TiPB00
Tishby, N., Pereira, F. C., & Bialek, W. (2000) The information bottleneck method. arXiv:physics/0004057.
TiPo11
Tishby, N., & Polani, D. (2011) Information theory of decisions and actions. In PERCEPTION-ACTION CYCLE (pp. 601–636). Springer
Toda11
Toda, A. A.(2011) An Information-Theoretic Approach to Nonparametric Estimation, Model Selection, and Goodness of Fit. arXiv:1103.4890 [Math, Stat].
Tork03
Torkkola, K. (2003) Feature extraction by non parametric mutual information maximization. J. Mach. Learn. Res., 3, 1415–1438.
Vand11
Van de Cruys, T. (2011) Two multivariate generalizations of pointwise mutual information. (pp. 16–20). Presented at the Proceedings of the Workshop on Distributional Semantics and Compositionality, Association for Computational Linguistics
VePa08
Vejmelka, M., & Paluš, M. (2008) Inferring the directionality of coupling with conditional mutual information. Phys. Rev. E, 77(2), 26214. DOI.
VeVi04
Vereshchagin, N. K., & Vitanyi, P. M. B.(2004) Kolmogorov’s structure functions and model selection. IEEE Transactions on Information Theory, 50(12), 3265–3290. DOI.
VeVi10
Vereshchagin, N. K., & Vitanyi, P. M. B.(2010) Rate Distortion and Denoising of Individual Data Using Kolmogorov Complexity. IEEE Transactions on Information Theory, 56(7), 3438–3454. DOI.
Vict02
Victor, J. D.(2002) Binless strategies for estimation of information from neural data. Physical Review E, 66, 51903. DOI.
Vict06
Victor, J. D.(2006) Approaches to Information-Theoretic Analysis of Neural Activity. Biological Theory. DOI.
VuYK09
Vu, V. Q., Yu, B., & Kass, R. E.(2009) Information in the nonstationary case. Neural Computation, 21(3), 688–703. DOI.
WaLP10
Wang, X. R., Lizier, J. T., & Prokopenko, M. (2010) A Fisher information study of phase transitions in random Boolean networks. (pp. 305–312). Presented at the Proceedings of the 12th International Conference on the Synthesis and Simulation of Living Systems (Alife XII)
WaLP11
Wang, X. R., Lizier, J. T., & Prokopenko, M. (2011) Fisher Information at the Edge of Chaos in Random Boolean Networks. Artificial Life. DOI.
WMLP00
Wang, X. R., Miller, J. M., Lizier, J. T., Prokopenko, M., & Rossi, L. F.(n.d.) Measuring Information Storage and Transfer in Swarms. In Proc. Eleventh European Conference on the Synthesis and Simulation of Living Systems (pp. 838–845). Cambridge, MA: The MIT Press
WeVe12
Weidmann, C., & Vetterli, M. (2012) Rate Distortion Behavior of Sparse Sources. IEEE Transactions on Information Theory, 58(8), 4969–4992. DOI.
WeKP11
Weissman, T., Kim, Y.-H., & Permuter, H. H.(2011) Directed Information, Causal Estimation, and Communication in Continuous Time. arXiv:1109.0351.
WiLL12
Wilmer, A., de Lussanet, M., & Lappe, M. (2012) Time-Delayed Mutual Information of the Phase as a Measure of Functional Connectivity. PLoS ONE, 7(9), 44633. DOI.
WoWo94a
Wolf, D. R., & Wolpert, D. H.(1994a) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics. arXiv:comp-gas/9403002.
Wolp06a
Wolpert, D. H.(2006a) Information Theory—The Bridge Connecting Bounded Rational Game Theory and Statistical Physics. In Complex Engineered Systems (pp. 262–290). Springer Berlin Heidelberg
Wolp06b
Wolpert, D. H.(2006b) What Information Theory says about Bounded Rational Best Response. In The Complex Networks of Economic Interactions (pp. 293–306). Springer
WoWo94b
Wolpert, D. H., & Wolf, D. R.(1994b) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv:comp-gas/9403001.
WuYa14
Wu, Y., & Yang, P. (2014) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. arXiv:1407.0381 [Cs, Math, Stat].
Yosh10
Yoshida, T. (2010) A Graph Model for Clustering Based on Mutual Information. In B.-T. Zhang & M. A. Orgun (Eds.), PRICAI 2010: Trends in Artificial Intelligence (Vol. 6230, pp. 339–350). Berlin, Heidelberg: Springer Berlin Heidelberg
ZaGC07
Zanardi, P., Giorda, P., & Cozzini, M. (2007) Information-Theoretic Differential Geometry of Quantum Phase Transitions. Physical Review Letters, 99(10), 100603. DOI.
ZhGr14
Zhang, Z., & Grabchak, M. (2014) Nonparametric Estimation of Küllback-Leibler Divergence. Neural Computation, 26(11), 2570–2593. DOI.