# Inference in graphical models

### Given what I know about what I know, what do I know?

People recommend me Koller and Friedman, which includes many different flavours of graphical model and many different methods, (KoFr09) but I personally didn’t like it. It drowned me in details without motivation, and left me feeling drained yet uninformed. YMMV.

Spirtes et al (SpGS01) and Pearl (Pear08) are readable. Murphy’s textbook (Murp12) has a minimal introduction intermixed with some related models, with a more ML, more Bayesian formalism. I’ve had Lauritzen (Laur96) recommended too, and it’s very abstract but quite clear and feels less ad hoc.

## Directed graphs

Graphs of conditional, directed independence are a convenient formalism for many models.

What’s special here is how we handle independence relations and reasoning about them. In one sense there is nothing special about graphical models; it’s just a graph of which variables are conditionally independent of which others. On the other hand, that graph is a powerful analytic tool, telling you what is confounded with what, and when. Moreover, you can use conditional independence tests to construct that graph even without necessarily constructing the whole model (e.g. ZPJS12).

Once you have the graph, you can infer more detailed relations than mere conditional dependence or otherwise; this is precisely that hierarchical models emphasise.

These can even be causal graphical models, and when we can infer those we are extracting Science (ONO) from observational data. This is really interesting; see causal graphical models

## Undirected, a.k.a. Markov graphs

a.k.a Markov random fields, Markov random networks. (other types?)

I would like to know about spatial Poisson random fields, Markov random fields, Bernoulli (or is it Boolean?) random fields, esp for discrete multivariate sequences. Gibbs and Boltzman distribution inference.

A smartarse connection to neural networks is in Ranz13.

## Factor graphs

A unifying formalism for the directed, and undirected graphical models How does that work then?

Wikipedia

A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum-product algorithm.

Hmm.

## Chain graphs

Partially directed random fields, for some kind of definition thereof? The classic chain graph of the 80s allows you to have cycling sets of mutually influencing variables, connected by directed acyclic influence.

## Implementations

Pedagogically useful, although probably not industrial-grade, David Barber’s discrete graphical model code (Julia).

Unobserved variables, in e.g. BUGS…?

## Refs

AlSH04
Altun, Y., Smola, A. J., & Hofmann, T. (2004) Exponential Families for Conditional Random Fields. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 2–9). Arlington, Virginia, United States: AUAI Press
ArGZ17
Aragam, B., Gu, J., & Zhou, Q. (2017) Learning Large-Scale Bayesian Networks with the sparsebn Package. ArXiv:1703.04025 [Cs, Stat].
ArZh15
Aragam, B., & Zhou, Q. (2015) Concave Penalized Estimation of Sparse Gaussian Bayesian Networks. Journal of Machine Learning Research, 16, 2273–2328.
ArMS09
Aral, S., Muchnik, L., & Sundararajan, A. (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. DOI.
ArCS99
Arnold, B. C., Castillo, E., & Sarabia, J. M.(1999) Conditional specification of statistical models. . Springer Science & Business Media
BaMW00
Baddeley, A. J., Møller, J., & Waagepetersen, R. (2000) Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica, 54(3), 329–350. DOI.
BaVa95
Baddeley, A. J., & Van Lieshout, M.-C. N.(1995) Area-interaction point processes. Annals of the Institute of Statistical Mathematics, 47(4), 601–619. DOI.
BaVM96
Baddeley, A. J., Van Lieshout, M.-C. N., & Møller, J. (1996) Markov Properties of Cluster Processes. Advances in Applied Probability, 28(2), 346–355. DOI.
BaMø89
Baddeley, A., & Møller, J. (1989) Nearest-Neighbour Markov Point Processes and Random Sets. International Statistical Review / Revue Internationale de Statistique, 57(2), 89–121. DOI.
Barb12
Barber, D. (2012) Bayesian reasoning and machine learning. . Cambridge ; New York: Cambridge University Press
BaTP14
Bareinboim, E., Tian, J., & Pearl, J. (2014) Recovering from Selection Bias in Causal and Statistical Inference. In AAAI (pp. 2410–2416).
BaBe02
Bartolucci, F., & Besag, J. (2002) A recursive algorithm for Markov random fields. Biometrika, 89(3), 724–730. DOI.
Beal03
Beal, M. J.(2003) Variational algorithms for approximate Bayesian inference. . University of London
Besa74
Besag, J. (1974) Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2), 192–236.
Besa75
Besag, J. (1975) Statistical Analysis of Non-Lattice Data. Journal of the Royal Statistical Society. Series D (The Statistician), 24(3), 179–195. DOI.
Besa86
Besag, J. (1986) On the Statistical Analysis of Dirty Pictures. Journal of the Royal Statistical Society. Series B (Methodological), 48(3), 259–302.
Bish06
Bishop, C. M.(2006) Pattern recognition and machine learning. . New York: Springer
BlKR11
Blake, A., Kohli, P., & Rother, C. (Eds.). (2011) Markov Random Fields for Vision and Image Processing. . Cambridge, Mass: MIT Press
BLZS15
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. ArXiv:1507.03652 [Math, Stat].
Boyd10
Boyd, S. (2010) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends® in Machine Learning, 3(1), 1–122. DOI.
BGKR15
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L.(2015) Inferring causal impact using Bayesian structural time-series models. The Annals of Applied Statistics, 9(1), 247–274. DOI.
BuLe17
Bu, Y., & Lederer, J. (2017) Integrating Additional Knowledge Into Estimation of Graphical Models. ArXiv:1704.02739 [Stat].
BüKM14
Bühlmann, P., Kalisch, M., & Meier, L. (2014) High-Dimensional Statistics with a View Toward Applications in Biology. Annual Review of Statistics and Its Application, 1(1), 255–278. DOI.
BüRK13
Bühlmann, P., Rütimann, P., & Kalisch, M. (2013) Controlling false positive selections in high-dimensional regression and causal inference. Statistical Methods in Medical Research, 22(5), 466–492.
CeFP03
Celeux, G., Forbes, F., & Peyrard, N. (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognition, 36(1), 131–144. DOI.
CDHB09
Cevher, V., Duarte, M. F., Hegde, C., & Baraniuk, R. (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
ChFo07
Christakis, N. A., & Fowler, J. H.(2007) The Spread of Obesity in a Large Social Network over 32 Years. New England Journal of Medicine, 357(4), 370–379. DOI.
Clif90
Clifford, P. (1990) Markov random fields in statistics. In G. R. Grimmett & D. J. A. Welsh (Eds.), Disorder in Physical Systems: A Volume in Honour of John Hammersley. Oxford England : New York: Oxford University Press
CrMí14
Crisan, D., & Míguez, J. (2014) Particle-kernel estimation of the filter density in state-space models. Bernoulli, 20(4), 1879–1929. DOI.
Dawi79
Dawid, A. P.(1979) Conditional independence in statistical theory. Journal of the Royal Statistical Society. Series B (Methodological), 41(1), 1–31.
Dawi80
Dawid, A. P.(1980) Conditional Independence for Statistical Operations. The Annals of Statistics, 8(3), 598–617. DOI.
Dawi01
Dawid, A. P.(2001) Separoids: A Mathematical Framework for Conditional Independence and Irrelevance. Annals of Mathematics and Artificial Intelligence, 32(1–4), 335–372. DOI.
DeWR11
De Luna, X., Waernbaum, I., & Richardson, T. S.(2011) Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika, asr041. DOI.
EdAn15
Edwards, D., & Ankinakatte, S. (2015) Context-specific graphical models for discrete longitudinal data. Statistical Modelling, 15(4), 301–325. DOI.
Fixx77
Fixx, J. F.(1977) Games for the superintelligent. . London: Muller
FoPe03
Forbes, F., & Peyrard, N. (2003) Hidden Markov random field model selection criteria based on mean field-like approximations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1089–1101. DOI.
Frey03
Frey, B. J.(2003) Extending factor graphs so as to unify directed and undirected graphical models. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 257–264). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
FrJo05
Frey, B. J., & Jojic, N. (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9), 1392–1416. DOI.
Frid03
Fridman, A. (2003) Mixed Markov models. Proceedings of the National Academy of Sciences, 100(14), 8092–8096. DOI.
FrHT08
Friedman, J., Hastie, T., & Tibshirani, R. (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. DOI.
FrRu07
Friel, N., & Rue, H. (2007) Recursive computing and simulation-free inference for general factorizable models. Biometrika, 94(3), 661–672. DOI.
Geye91
Geyer, C. J.(1991) Markov chain Monte Carlo maximum likelihood.
GeMø94
Geyer, C. J., & Møller, J. (1994) Simulation procedures and likelihood inference for spatial point processes. Scandinavian Journal of Statistics, 359–373.
Gold13
Goldberg, D. A.(2013) Higher order Markov random fields for independent sets. ArXiv:1301.1762 [Math-Ph].
Gren89
Grenander, U. (1989) Advances in Pattern Theory. The Annals of Statistics, 17(1), 1–30. DOI.
Grif76
Griffeath, D. (1976) Introduction to Random Fields. In Denumerable Markov Chains (pp. 425–458). Springer New York
GuFZ14
Gu, J., Fu, F., & Zhou, Q. (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data. ArXiv:1403.2310 [Stat].
HäLM99
Häggström, O., van Lieshout, M.-C. N. M., & Møller, J. (1999) Characterization results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli, 5(4), 641–658.
HCMR00
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., & Kadie, C. (2000) Dependency Networks for Inference, Collaborative Filtering, and Data Visualization. Journal of Machine Learning Research, 1(Oct), 49–75.
JeMø91
Jensen, J. L., & Møller, J. (1991) Pseudolikelihood for Exponential Family Models of Spatial Point Processes. The Annals of Applied Probability, 1(3), 445–461. DOI.
Jord99
Jordan, M. I.(1999) Learning in graphical models. . Cambridge, Mass.: MIT Press
Jord04
Jordan, M. I.(2004) Graphical Models. Statistical Science, 19(1), 140–155.
JGJS99
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K.(1999) An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2), 183–233. DOI.
JoWe02a
Jordan, M. I., & Weiss, Y. (2002a) Graphical models: Probabilistic inference. The Handbook of Brain Theory and Neural Networks, 490–496.
JoWe02b
Jordan, M. I., & Weiss, Y. (2002b) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory.
KaBü07
Kalisch, M., & Bühlmann, P. (2007) Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
KiSn80a
Kindermann, R. P., & Snell, J. L.(1980a) On the relation between Markov random fields and social networks. The Journal of Mathematical Sociology, 7(1), 1–13. DOI.
KiSn80b
Kindermann, R., & Snell, J. L.(1980b) Markov Random Fields and Their Applications. (Vol. 1). Providence, Rhode Island: American Mathematical Society
Koch07
Koch, V. M.(2007) A factor graph approach to model-based signal separation. . ETH Zurich, Konstanz
KoFr09
Koller, D., & Friedman, N. (2009) Probabilistic graphical models : principles and techniques. . Cambridge, MA: MIT Press
KrSB09
Krämer, N., Schäfer, J., & Boulesteix, A.-L. (2009) Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics, 10(1), 384. DOI.
KrGu09
Krause, A., & Guestrin, C. (2009) Optimal value of information in graphical models. J. Artif. Int. Res., 35(1), 557–591.
KsFL01
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001) Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519. DOI.
Laur96
Lauritzen, S. L.(1996) Graphical Models. . Clarendon Press
LaSp88
Lauritzen, S. L., & Spiegelhalter, D. J.(1988) Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224.
LaPi03a
Lavrenko, V., & Pickens, J. (2003a) Music modeling with random fields. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (p. 389). ACM Press DOI.
LaPi03b
Lavrenko, V., & Pickens, J. (2003b) Polyphonic music modeling with random fields. In Proceedings of the eleventh ACM international conference on Multimedia (p. 120). ACM Press DOI.
Lede16
Lederer, J. (2016) Graphical Models for Discrete and Continuous Data. ArXiv:1609.05551 [Math, Stat].
LHYL12a
Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012a) High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40(4), 2293–2326. DOI.
LHYL12b
Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012b) The Nonparanormal SKEPTIC. ArXiv:1206.6488 [Cs, Stat].
LiRW10
Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
Loel04
Loeliger, H.-A. (2004) An introduction to factor graphs. IEEE Signal Processing Magazine, 21(1), 28–41. DOI.
MaCo13
Maathuis, M. H., & Colombo, D. (2013) A generalized backdoor criterion. ArXiv Preprint ArXiv:1307.5636.
MaLK06
Maddage, N. C., Li, H., & Kankanhalli, M. S.(2006) Music structure based vector space retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (p. 67). ACM Press DOI.
MaJW06
Malioutov, D. M., Johnson, J. K., & Willsky, A. S.(2006) Walk-Sums and Belief Propagation in Gaussian Graphical Models. Journal of Machine Learning Research, 7, 2031—2064.
MaKF04
Mao, Y., Kschischang, F. R., & Frey, B. J.(2004) Convolutional Factor Graphs As Probabilistic Models. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 374–381). Arlington, Virginia, United States: AUAI Press
MPSM10
Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., & Stolovitzky, G. (2010) Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences, 107(14), 6286–6291. DOI.
Mcca12
McCallum, A. (2012) Efficiently Inducing Features of Conditional Random Fields. ArXiv:1212.2504 [Cs, Stat].
MeBü06
Meinshausen, N., & Bühlmann, P. (2006) High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI.
MiMo07
Mihalkova, L., & Mooney, R. J.(2007) Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on Machine learning (pp. 625–632). ACM
Mont11
Montanari, A. (2011) Lecture Notes for Stat 375 Inference in Graphical Models.
MBLR14
Morgan, J. S., Barjasteh, I., Lampe, C., & Radha, H. (2014) The entropy of attention and popularity in youtube videos. ArXiv:1412.1185 [Physics].
Murp12
Murphy, K. P.(2012a) Machine Learning: A Probabilistic Perspective. (1 edition.). Cambridge, MA: The MIT Press
OsVK11
Osokin, A., Vetrov, D., & Kolmogorov, V. (2011) Submodular decomposition framework for inference in associative Markov networks with global constraints. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1889–1896). DOI.
Pear82
Pearl, J. (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. In in Proceedings of the National Conference on Artificial Intelligence (pp. 133–136).
Pear86
Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
Pear08
Pearl, J. (2008) Probabilistic reasoning in intelligent systems: networks of plausible inference. (Rev. 2. print., 12. [Dr.].). San Francisco, Calif: Kaufmann
Pear09
Pearl, J. (2009) Causality: Models, Reasoning and Inference. . Cambridge University Press
PeQB05
Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005) Nonlinear multivariate analysis of neurophysiological signals. Progress in Neurobiology, 77(1–2), 1–37.
Pick04
Pickens, J. (2004) Harmonic modeling for polyphonic music retrieval. . Citeseer
PiIl05
Pickens, J., & Iliopoulos, C. S.(2005) Markov Random Fields and Maximum Entropy Modeling for Music Information Retrieval. In ISMIR (pp. 207–214). Citeseer
Poll04
Pollard, D. (2004) Hammersley-Clifford theorem for Markov random fields.
RaFN08
Rabbat, M. G., Figueiredo, Má. A. T., & Nowak, R. D.(2008) Network Inference from Co-Occurrences. IEEE Transactions on Information Theory, 54(9), 4053–4068. DOI.
Ranz13
Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35(9), 2206–2222. DOI.
Ravi, S., & Diao, Q. (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
RaWL10
Ravikumar, P., Wainwright, M. J., & Lafferty, J. D.(2010) High-dimensional Ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38(3), 1287–1319. DOI.
RePe04
Reeves, R., & Pettitt, A. N.(2004) Efficient recursions for general factorisable models. Biometrika, 91(3), 751–757. DOI.
RiDo06
Richardson, M., & Domingos, P. (2006) Markov logic networks. Machine Learning, 62(1–2), 107–136.
RiKe77
Ripley, B. D., & Kelly, F. P.(1977) Markov Point Processes. Journal of the London Mathematical Society, s2-15(1), 188–192. DOI.
ScMu10
Schmidt, M. W., & Murphy, K. P.(2010) Convex structure learning in log-linear models: Beyond pairwise potentials. In International Conference on Artificial Intelligence and Statistics (pp. 709–716).
Shac98
Shachter, R. D.(1998) Bayes-ball: Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams). In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (pp. 480–487). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
ShMc16
Shalizi, C. R., & McFowland III, E. (2016) Controlling for Latent Homophily in Social Networks through Inferring Latent Locations. ArXiv:1607.06565 [Physics, Stat].
SmEi08
Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
SpGS01
Spirtes, P., Glymour, C., & Scheines, R. (2001) Causation, Prediction, and Search. (Second Edition.). The MIT Press
Stud97
Studený, M. (1997) A recovery algorithm for chain graphs. International Journal of Approximate Reasoning, 17(2–3), 265–293. DOI.
Stud05
Studený, M. (2005) Probabilistic conditional independence structures. . London: Springer
StVe98
Studený, M., & Vejnarová, J. (1998) On multiinformation function as a tool for measuring stochastic dependence. In Learning in graphical models (pp. 261–297). Cambridge, Mass.: MIT Press
SuWL12
Su, R.-Q., Wang, W.-X., & Lai, Y.-C. (2012) Detecting hidden nodes in complex networks from time series. Phys. Rev. E, 85(6), 065201. DOI.
SuMc10
Sutton, C., & McCallum, A. (2010) An Introduction to Conditional Random Fields. ArXiv:1011.4088.
TPSR15
Tansey, W., Padilla, O. H. M., Suggala, A. S., & Ravikumar, P. (2015) Vector-Space Markov Random Fields via Exponential Families. In Journal of Machine Learning Research (pp. 684–692).
VeOs11
Vetrov, D., & Osokin, A. (2011) Graph Preserving Label Decomposition in Discrete MRFs with Selfish Potentials. In NIPS Workshop on Discrete Optimization in Machine learning (DISCML NIPS).
ViCo14
Visweswaran, S., & Cooper, G. F.(2014) Counting Markov Blanket Structures. ArXiv:1407.2483 [Cs, Stat].
WaJo08
Wainwright, M. J., & Jordan, M. I.(2008) Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1–305. DOI.
WaJo05
Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press
WaKP13
Wang, C., Komodakis, N., & Paragios, N. (2013) Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding, 117(11), 1610–1627. DOI.
WaKR13
Wasserman, L., Kolar, M., & Rinaldo, A. (2013) Estimating Undirected Graphs Under Weak Assumptions. ArXiv:1309.6933 [Cs, Math, Stat].
Weis00
Weiss, Y. (2000) Correctness of Local Probability Propagation in Graphical Models with Loops. Neural Computation, 12(1), 1–41. DOI.
WeFr01
Weiss, Y., & Freeman, W. T.(2001) Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology. Neural Computation, 13(10), 2173–2200. DOI.
WiBi05
Winn, J. M., & Bishop, C. M.(2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
Wrig34
Wright, S. (1934) The Method of Path Coefficients. The Annals of Mathematical Statistics, 5(3), 161–215. DOI.
WuSN13
Wu, R., Srikant, R., & Ni, J. (2013) Learning Loosely Connected Markov Random Fields. Stochastic Systems, 3(2), 362–404. DOI.
YeFW03
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
YeFW05
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7), 2282–2312. DOI.
ZPJS12
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. ArXiv:1202.3775 [Cs, Stat].