The Living Thing / Notebooks :

Inference in graphical models

Given what I know about what I know, what do I know?


Introductory reading

People recommend me Koller and Friedman, which includes many different flavours of graphical model and many different methods, (KoFr09) but I personally didn’t like it. It drowned me in details without motivation, and left me feeling drained yet uninformed. YMMV.

Spirtes et al (SpGS01) and Pearl (Pear08) are readable. Murphy’s textbook (Murp12) has a minimal introduction intermixed with some related models, with a more ML, more Bayesian formalism. I’ve had Lauritzen (Laur96) recommended too, and it’s very abstract but quite clear and feels less ad hoc.

Directed graphs

Graphs of conditional, directed independence are a convenient formalism for many models.

What’s special here is how we handle independence relations and reasoning about them. In one sense there is nothing special about graphical models; it’s just a graph of which variables are conditionally independent of which others. On the other hand, that graph is a powerful analytic tool, telling you what is confounded with what, and when. Moreover, you can use conditional independence tests to construct that graph even without necessarily constructing the whole model (e.g. ZPJS12).

Once you have the graph, you can infer more detailed relations than mere conditional dependence or otherwise; this is precisely that hierarchical models emphasise.

These can even be causal graphical models, and when we can infer those we are extracting Science (ONO) from observational data. This is really interesting; see causal graphical models

Undirected, a.k.a. Markov graphs

a.k.a Markov random fields, Markov random networks. (other types?)

I would like to know about spatial Poisson random fields, Markov random fields, Bernoulli (or is it Boolean?) random fields, esp for discrete multivariate sequences. Gibbs and Boltzman distribution inference.

A smartarse connection to neural networks is in Ranz13.

Factor graphs

A unifying formalism for the directed, and undirected graphical models How does that work then?


A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum-product algorithm.


Chain graphs

Partially directed random fields, for some kind of definition thereof? The classic chain graph of the 80s allows you to have cycling sets of mutually influencing variables, connected by directed acyclic influence.


Pedagogically useful, although probably not industrial-grade, David Barber’s discrete graphical model code (Julia).

Unobserved variables, in e.g. BUGS…?


Altun, Y., Smola, A. J., & Hofmann, T. (2004) Exponential Families for Conditional Random Fields. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 2–9). Arlington, Virginia, United States: AUAI Press
Aragam, B., Gu, J., & Zhou, Q. (2017) Learning Large-Scale Bayesian Networks with the sparsebn Package. ArXiv:1703.04025 [Cs, Stat].
Aragam, B., & Zhou, Q. (2015) Concave Penalized Estimation of Sparse Gaussian Bayesian Networks. Journal of Machine Learning Research, 16, 2273–2328.
Aral, S., Muchnik, L., & Sundararajan, A. (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. DOI.
Arnold, B. C., Castillo, E., & Sarabia, J. M.(1999) Conditional specification of statistical models. . Springer Science & Business Media
Baddeley, A. J., Møller, J., & Waagepetersen, R. (2000) Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica, 54(3), 329–350. DOI.
Baddeley, A. J., & Van Lieshout, M.-C. N.(1995) Area-interaction point processes. Annals of the Institute of Statistical Mathematics, 47(4), 601–619. DOI.
Baddeley, A. J., Van Lieshout, M.-C. N., & Møller, J. (1996) Markov Properties of Cluster Processes. Advances in Applied Probability, 28(2), 346–355. DOI.
Baddeley, A., & Møller, J. (1989) Nearest-Neighbour Markov Point Processes and Random Sets. International Statistical Review / Revue Internationale de Statistique, 57(2), 89–121. DOI.
Barber, D. (2012) Bayesian reasoning and machine learning. . Cambridge ; New York: Cambridge University Press
Bareinboim, E., Tian, J., & Pearl, J. (2014) Recovering from Selection Bias in Causal and Statistical Inference. In AAAI (pp. 2410–2416).
Bartolucci, F., & Besag, J. (2002) A recursive algorithm for Markov random fields. Biometrika, 89(3), 724–730. DOI.
Beal, M. J.(2003) Variational algorithms for approximate Bayesian inference. . University of London
Besag, J. (1974) Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2), 192–236.
Besag, J. (1975) Statistical Analysis of Non-Lattice Data. Journal of the Royal Statistical Society. Series D (The Statistician), 24(3), 179–195. DOI.
Besag, J. (1986) On the Statistical Analysis of Dirty Pictures. Journal of the Royal Statistical Society. Series B (Methodological), 48(3), 259–302.
Bishop, C. M.(2006) Pattern recognition and machine learning. . New York: Springer
Blake, A., Kohli, P., & Rother, C. (Eds.). (2011) Markov Random Fields for Vision and Image Processing. . Cambridge, Mass: MIT Press
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. ArXiv:1507.03652 [Math, Stat].
Boyd, S. (2010) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends® in Machine Learning, 3(1), 1–122. DOI.
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L.(2015) Inferring causal impact using Bayesian structural time-series models. The Annals of Applied Statistics, 9(1), 247–274. DOI.
Bu, Y., & Lederer, J. (2017) Integrating Additional Knowledge Into Estimation of Graphical Models. ArXiv:1704.02739 [Stat].
Bühlmann, P., Kalisch, M., & Meier, L. (2014) High-Dimensional Statistics with a View Toward Applications in Biology. Annual Review of Statistics and Its Application, 1(1), 255–278. DOI.
Bühlmann, P., Rütimann, P., & Kalisch, M. (2013) Controlling false positive selections in high-dimensional regression and causal inference. Statistical Methods in Medical Research, 22(5), 466–492.
Celeux, G., Forbes, F., & Peyrard, N. (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognition, 36(1), 131–144. DOI.
Cevher, V., Duarte, M. F., Hegde, C., & Baraniuk, R. (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
Christakis, N. A., & Fowler, J. H.(2007) The Spread of Obesity in a Large Social Network over 32 Years. New England Journal of Medicine, 357(4), 370–379. DOI.
Clifford, P. (1990) Markov random fields in statistics. In G. R. Grimmett & D. J. A. Welsh (Eds.), Disorder in Physical Systems: A Volume in Honour of John Hammersley. Oxford England : New York: Oxford University Press
Crisan, D., & Míguez, J. (2014) Particle-kernel estimation of the filter density in state-space models. Bernoulli, 20(4), 1879–1929. DOI.
Dawid, A. P.(1979) Conditional independence in statistical theory. Journal of the Royal Statistical Society. Series B (Methodological), 41(1), 1–31.
Dawid, A. P.(1980) Conditional Independence for Statistical Operations. The Annals of Statistics, 8(3), 598–617. DOI.
Dawid, A. P.(2001) Separoids: A Mathematical Framework for Conditional Independence and Irrelevance. Annals of Mathematics and Artificial Intelligence, 32(1–4), 335–372. DOI.
De Luna, X., Waernbaum, I., & Richardson, T. S.(2011) Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika, asr041. DOI.
Edwards, D., & Ankinakatte, S. (2015) Context-specific graphical models for discrete longitudinal data. Statistical Modelling, 15(4), 301–325. DOI.
Fixx, J. F.(1977) Games for the superintelligent. . London: Muller
Forbes, F., & Peyrard, N. (2003) Hidden Markov random field model selection criteria based on mean field-like approximations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1089–1101. DOI.
Frey, B. J.(2003) Extending factor graphs so as to unify directed and undirected graphical models. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 257–264). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Frey, B. J., & Jojic, N. (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9), 1392–1416. DOI.
Fridman, A. (2003) Mixed Markov models. Proceedings of the National Academy of Sciences, 100(14), 8092–8096. DOI.
Friedman, J., Hastie, T., & Tibshirani, R. (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. DOI.
Friel, N., & Rue, H. (2007) Recursive computing and simulation-free inference for general factorizable models. Biometrika, 94(3), 661–672. DOI.
Geyer, C. J.(1991) Markov chain Monte Carlo maximum likelihood.
Geyer, C. J., & Møller, J. (1994) Simulation procedures and likelihood inference for spatial point processes. Scandinavian Journal of Statistics, 359–373.
Goldberg, D. A.(2013) Higher order Markov random fields for independent sets. ArXiv:1301.1762 [Math-Ph].
Grenander, U. (1989) Advances in Pattern Theory. The Annals of Statistics, 17(1), 1–30. DOI.
Griffeath, D. (1976) Introduction to Random Fields. In Denumerable Markov Chains (pp. 425–458). Springer New York
Gu, J., Fu, F., & Zhou, Q. (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data. ArXiv:1403.2310 [Stat].
Häggström, O., van Lieshout, M.-C. N. M., & Møller, J. (1999) Characterization results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli, 5(4), 641–658.
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., & Kadie, C. (2000) Dependency Networks for Inference, Collaborative Filtering, and Data Visualization. Journal of Machine Learning Research, 1(Oct), 49–75.
Jensen, J. L., & Møller, J. (1991) Pseudolikelihood for Exponential Family Models of Spatial Point Processes. The Annals of Applied Probability, 1(3), 445–461. DOI.
Jordan, M. I.(1999) Learning in graphical models. . Cambridge, Mass.: MIT Press
Jordan, M. I.(2004) Graphical Models. Statistical Science, 19(1), 140–155.
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K.(1999) An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2), 183–233. DOI.
Jordan, M. I., & Weiss, Y. (2002a) Graphical models: Probabilistic inference. The Handbook of Brain Theory and Neural Networks, 490–496.
Jordan, M. I., & Weiss, Y. (2002b) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory.
Kalisch, M., & Bühlmann, P. (2007) Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
Kindermann, R. P., & Snell, J. L.(1980a) On the relation between Markov random fields and social networks. The Journal of Mathematical Sociology, 7(1), 1–13. DOI.
Kindermann, R., & Snell, J. L.(1980b) Markov Random Fields and Their Applications. (Vol. 1). Providence, Rhode Island: American Mathematical Society
Koch, V. M.(2007) A factor graph approach to model-based signal separation. . ETH Zurich, Konstanz
Koller, D., & Friedman, N. (2009) Probabilistic graphical models : principles and techniques. . Cambridge, MA: MIT Press
Krämer, N., Schäfer, J., & Boulesteix, A.-L. (2009) Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics, 10(1), 384. DOI.
Krause, A., & Guestrin, C. (2009) Optimal value of information in graphical models. J. Artif. Int. Res., 35(1), 557–591.
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001) Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519. DOI.
Lauritzen, S. L.(1996) Graphical Models. . Clarendon Press
Lauritzen, S. L., & Spiegelhalter, D. J.(1988) Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224.
Lavrenko, V., & Pickens, J. (2003a) Music modeling with random fields. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (p. 389). ACM Press DOI.
Lavrenko, V., & Pickens, J. (2003b) Polyphonic music modeling with random fields. In Proceedings of the eleventh ACM international conference on Multimedia (p. 120). ACM Press DOI.
Lederer, J. (2016) Graphical Models for Discrete and Continuous Data. ArXiv:1609.05551 [Math, Stat].
Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012a) High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40(4), 2293–2326. DOI.
Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012b) The Nonparanormal SKEPTIC. ArXiv:1206.6488 [Cs, Stat].
Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
Loeliger, H.-A. (2004) An introduction to factor graphs. IEEE Signal Processing Magazine, 21(1), 28–41. DOI.
Maathuis, M. H., & Colombo, D. (2013) A generalized backdoor criterion. ArXiv Preprint ArXiv:1307.5636.
Maddage, N. C., Li, H., & Kankanhalli, M. S.(2006) Music structure based vector space retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (p. 67). ACM Press DOI.
Malioutov, D. M., Johnson, J. K., & Willsky, A. S.(2006) Walk-Sums and Belief Propagation in Gaussian Graphical Models. Journal of Machine Learning Research, 7, 2031—2064.
Mao, Y., Kschischang, F. R., & Frey, B. J.(2004) Convolutional Factor Graphs As Probabilistic Models. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 374–381). Arlington, Virginia, United States: AUAI Press
Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., & Stolovitzky, G. (2010) Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences, 107(14), 6286–6291. DOI.
McCallum, A. (2012) Efficiently Inducing Features of Conditional Random Fields. ArXiv:1212.2504 [Cs, Stat].
Meinshausen, N., & Bühlmann, P. (2006) High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI.
Mihalkova, L., & Mooney, R. J.(2007) Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on Machine learning (pp. 625–632). ACM
Montanari, A. (2011) Lecture Notes for Stat 375 Inference in Graphical Models.
Morgan, J. S., Barjasteh, I., Lampe, C., & Radha, H. (2014) The entropy of attention and popularity in youtube videos. ArXiv:1412.1185 [Physics].
Murphy, K. P.(2012a) Machine Learning: A Probabilistic Perspective. (1 edition.). Cambridge, MA: The MIT Press
Osokin, A., Vetrov, D., & Kolmogorov, V. (2011) Submodular decomposition framework for inference in associative Markov networks with global constraints. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1889–1896). DOI.
Pearl, J. (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. In in Proceedings of the National Conference on Artificial Intelligence (pp. 133–136).
Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
Pearl, J. (2008) Probabilistic reasoning in intelligent systems: networks of plausible inference. (Rev. 2. print., 12. [Dr.].). San Francisco, Calif: Kaufmann
Pearl, J. (2009) Causality: Models, Reasoning and Inference. . Cambridge University Press
Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005) Nonlinear multivariate analysis of neurophysiological signals. Progress in Neurobiology, 77(1–2), 1–37.
Pickens, J. (2004) Harmonic modeling for polyphonic music retrieval. . Citeseer
Pickens, J., & Iliopoulos, C. S.(2005) Markov Random Fields and Maximum Entropy Modeling for Music Information Retrieval. In ISMIR (pp. 207–214). Citeseer
Pollard, D. (2004) Hammersley-Clifford theorem for Markov random fields.
Rabbat, M. G., Figueiredo, Má. A. T., & Nowak, R. D.(2008) Network Inference from Co-Occurrences. IEEE Transactions on Information Theory, 54(9), 4053–4068. DOI.
Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35(9), 2206–2222. DOI.
Ravi, S., & Diao, Q. (2016) Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation. In PMLR (pp. 519–528).
Ravikumar, P., Wainwright, M. J., & Lafferty, J. D.(2010) High-dimensional Ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38(3), 1287–1319. DOI.
Reeves, R., & Pettitt, A. N.(2004) Efficient recursions for general factorisable models. Biometrika, 91(3), 751–757. DOI.
Richardson, M., & Domingos, P. (2006) Markov logic networks. Machine Learning, 62(1–2), 107–136.
Ripley, B. D., & Kelly, F. P.(1977) Markov Point Processes. Journal of the London Mathematical Society, s2-15(1), 188–192. DOI.
Schmidt, M. W., & Murphy, K. P.(2010) Convex structure learning in log-linear models: Beyond pairwise potentials. In International Conference on Artificial Intelligence and Statistics (pp. 709–716).
Shachter, R. D.(1998) Bayes-ball: Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams). In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (pp. 480–487). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Shalizi, C. R., & McFowland III, E. (2016) Controlling for Latent Homophily in Social Networks through Inferring Latent Locations. ArXiv:1607.06565 [Physics, Stat].
Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
Spirtes, P., Glymour, C., & Scheines, R. (2001) Causation, Prediction, and Search. (Second Edition.). The MIT Press
Studený, M. (1997) A recovery algorithm for chain graphs. International Journal of Approximate Reasoning, 17(2–3), 265–293. DOI.
Studený, M. (2005) Probabilistic conditional independence structures. . London: Springer
Studený, M., & Vejnarová, J. (1998) On multiinformation function as a tool for measuring stochastic dependence. In Learning in graphical models (pp. 261–297). Cambridge, Mass.: MIT Press
Su, R.-Q., Wang, W.-X., & Lai, Y.-C. (2012) Detecting hidden nodes in complex networks from time series. Phys. Rev. E, 85(6), 065201. DOI.
Sutton, C., & McCallum, A. (2010) An Introduction to Conditional Random Fields. ArXiv:1011.4088.
Tansey, W., Padilla, O. H. M., Suggala, A. S., & Ravikumar, P. (2015) Vector-Space Markov Random Fields via Exponential Families. In Journal of Machine Learning Research (pp. 684–692).
Vetrov, D., & Osokin, A. (2011) Graph Preserving Label Decomposition in Discrete MRFs with Selfish Potentials. In NIPS Workshop on Discrete Optimization in Machine learning (DISCML NIPS).
Visweswaran, S., & Cooper, G. F.(2014) Counting Markov Blanket Structures. ArXiv:1407.2483 [Cs, Stat].
Wainwright, M. J., & Jordan, M. I.(2008) Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1–305. DOI.
Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models. In New Directions in Statistical Signal Processing (Vol. 155). MIT Press
Wang, C., Komodakis, N., & Paragios, N. (2013) Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding, 117(11), 1610–1627. DOI.
Wasserman, L., Kolar, M., & Rinaldo, A. (2013) Estimating Undirected Graphs Under Weak Assumptions. ArXiv:1309.6933 [Cs, Math, Stat].
Weiss, Y. (2000) Correctness of Local Probability Propagation in Graphical Models with Loops. Neural Computation, 12(1), 1–41. DOI.
Weiss, Y., & Freeman, W. T.(2001) Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology. Neural Computation, 13(10), 2173–2200. DOI.
Winn, J. M., & Bishop, C. M.(2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
Wright, S. (1934) The Method of Path Coefficients. The Annals of Mathematical Statistics, 5(3), 161–215. DOI.
Wu, R., Srikant, R., & Ni, J. (2013) Learning Loosely Connected Markov Random Fields. Stochastic Systems, 3(2), 362–404. DOI.
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7), 2282–2312. DOI.
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. ArXiv:1202.3775 [Cs, Stat].