a.k.a Bayesian Belief networks, Directed graphical models, feedforward neural networks or hierarchical structural models.
Graphs of conditional, directed independence are a convenient formalism for many models.
What’s special here is how we handle independence relations and reasoning about them. In one sense there is nothing special about graphical models; it’s just a graph of which variables are conditionally independent of which others. On the other hand, that graph is a powerful analytic tool, telling you what is confounded with what, and when. Moreover, you can use conditional independence tests to construct that graph even without necessarily constructing the whole model (e.g. ZPJS12).
Once you have the graph, you can infer more detailed relations than mere conditional dependence or otherwise; this is precisely that hierarchical models emphasise.
These can even be causal graphical models, and when we can infer those we are extracting Science (ONO) from observational data. This is really interesting; see causal graphical models
distinction between DAG and Markov graphs
People recommend me Koller and Friedman, which includes many different flavours of DAG model and many different methods, (KoFr09) but I personally didn’t like it. It drowned me in details without motivation, and left me feeling drained yet uninformed. YMMV.
Spirtes et al (SpGS01) and Pearl (Pear08) are readable. I’ve had Lauritzen (Laur96) recommended too, but haven’t looked at it yet. Murphy’s textbook (Murp12) has a minimal introduction intermixed with some related models, with a more ML, more Bayesian formalism.
Graph inference from data
Much more work.
Oooh! look! Software!
bnlearn learns belief networks
A lot of other R packages. TBD - recommendations
A new R package for learning sparse Bayesian networks and other graphical models from high-dimensional data via sparse regularization. Designed from the ground up to handle:
- Experimental data with interventions
- Mixed observational / experimental data
- High-dimensional data with p >> n
- Datasets with thousands of variables (tested up to p=8000)
- Continuous and discrete data
The emphasis of this package is scalability and statistical consistency on high-dimensional datasets. […] For more details on this package, including worked examples and the methodological background, please see our new preprint .
The main methods for learning graphical models are:
- estimate.dag for directed acyclic graphs (Bayesian networks).
- estimate.precision for undirected graphs (Markov random fields).
- estimate.covariance for covariance matrices.
Currently, estimation of precision and covariances matrices is limited to Gaussian data.
- Aragam, B., Amini, A. A., & Zhou, Q. (2015) Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression. arXiv:1511.08963 [Cs, Math, Stat].
- Aragam, B., Gu, J., & Zhou, Q. (2017) Learning Large-Scale Bayesian Networks with the sparsebn Package. arXiv:1703.04025 [Cs, Stat].
- Aragam, B., & Zhou, Q. (2015) Concave Penalized Estimation of Sparse Gaussian Bayesian Networks. Journal of Machine Learning Research, 16, 2273–2328.
- Aral, S., Muchnik, L., & Sundararajan, A. (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. DOI.
- Arnold, B. C., Castillo, E., & Sarabia, J. M.(1999) Conditional specification of statistical models. . Springer Science & Business Media
- Bareinboim, E., Tian, J., & Pearl, J. (2014) Recovering from Selection Bias in Causal and Statistical Inference. In AAAI (pp. 2410–2416).
- Beal, M. J.(2003) Variational algorithms for approximate Bayesian inference. . University of London
- Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. arXiv:1507.03652 [Math, Stat].
- Bühlmann, P., Kalisch, M., & Meier, L. (2014) High-Dimensional Statistics with a View Toward Applications in Biology. Annual Review of Statistics and Its Application, 1(1), 255–278. DOI.
- Bühlmann, P., Rütimann, P., & Kalisch, M. (2013) Controlling false positive selections in high-dimensional regression and causal inference. Statistical Methods in Medical Research, 22(5), 466–492.
- Chen, B., & Pearl, J. (2012) Regression and causation: A critical examination of econometric textbooks.
- Christakis, N. A., & Fowler, J. H.(2007) The Spread of Obesity in a Large Social Network over 32 Years. New England Journal of Medicine, 357(4), 370–379. DOI.
- Colombo, D., Maathuis, M. H., Kalisch, M., & Richardson, T. S.(2012) Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40(1), 294–321.
- De Luna, X., Waernbaum, I., & Richardson, T. S.(2011) Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika, asr041. DOI.
- Edwards, D., & Ankinakatte, S. (2015) Context-specific graphical models for discrete longitudinal data. Statistical Modelling, 15(4), 301–325. DOI.
- Fixx, J. F.(1977) Games for the superintelligent. . London: Muller
- Frey, B. J., & Jojic, N. (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9), 1392–1416. DOI.
- Gu, J., Fu, F., & Zhou, Q. (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data. arXiv:1403.2310 [Stat].
- Jordan, Michael I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K.(1999) An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2), 183–233. DOI.
- Jordan, Michael I., & Weiss, Y. (2002a) Graphical models: Probabilistic inference. The Handbook of Brain Theory and Neural Networks, 490–496.
- Jordan, Michael I., & Weiss, Y. (2002b) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory.
- Jordan, Michael Irwin. (1999) Learning in graphical models. . Cambridge, Mass.: MIT Press
- Kalisch, M., & Bühlmann, P. (2007) Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
- Koller, D., & Friedman, N. (2009) Probabilistic graphical models : principles and techniques. . Cambridge, MA: MIT Press
- Krause, A., & Guestrin, C. (2009) Optimal value of information in graphical models. J. Artif. Int. Res., 35(1), 557–591.
- Lauritzen, S. L., & Spiegelhalter, D. J.(1988) Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224.
- Lauritzen, Steffen L. (1996) Graphical Models. . Clarendon Press
- Maathuis, M. H., & Colombo, D. (2013) A generalized backdoor criterion. arXiv Preprint arXiv:1307.5636.
- Malioutov, D. M., Johnson, J. K., & Willsky, A. S.(2006) Walk-Sums and Belief Propagation in Gaussian Graphical Models. Journal of Machine Learning Research, 7, 2031—2064.
- Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., & Stolovitzky, G. (2010) Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences, 107(14), 6286–6291. DOI.
- Mihalkova, L., & Mooney, R. J.(2007) Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on Machine learning (pp. 625–632). ACM
- Montanari, A. (2011) Lecture Notes for Stat 375 Inference in Graphical Models.
- Murphy, K. P.(2012) Machine Learning: A Probabilistic Perspective. (1 edition.). Cambridge, MA: The MIT Press
- Neapolitan, R. E., & others. (2004) Learning bayesian networks. (Vol. 38). Prentice Hall Upper Saddle River
- Pearl, J. (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. In in Proceedings of the National Conference on Artificial Intelligence (pp. 133–136).
- Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
- Pearl, J. (2008) Probabilistic reasoning in intelligent systems: networks of plausible inference. (Rev. 2. print., 12. [Dr.].). San Francisco, Calif: Kaufmann
- Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005) Nonlinear multivariate analysis of neurophysiological signals. Progress in Neurobiology, 77(1–2), 1–37.
- Pollard, D. (2004) Hammersley-Clifford theorem for Markov random fields.
- Rabbat, M. G., Figueiredo, Má. A. T., & Nowak, R. D.(2008) Network Inference from Co-Occurrences. IEEE Transactions on Information Theory, 54(9), 4053–4068. DOI.
- Schmidt, M. (2010) Graphical model structure learning with l1-regularization. . UNIVERSITY OF BRITISH COLUMBIA
- Shalizi, C. R., & McFowland III, E. (2016) Controlling for Latent Homophily in Social Networks through Inferring Latent Locations. arXiv:1607.06565 [Physics, Stat].
- Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
- Spirtes, P., Glymour, C., & Scheines, R. (2001) Causation, Prediction, and Search. (Second Edition.). The MIT Press
- Studený, M., & Vejnarová, J. (1998) On multiinformation function as a tool for measuring stochastic dependence. In Learning in graphical models (pp. 261–297). Cambridge, Mass.: MIT Press
- Su, R.-Q., Wang, W.-X., & Lai, Y.-C. (2012) Detecting hidden nodes in complex networks from time series. Phys. Rev. E, 85(6), 065201. DOI.
- Textor, J., Idelberger, A., & Liśkiewicz, M. (2015) Learning from Pairwise Marginal Independencies. arXiv:1508.00280 [Cs].
- Visweswaran, S., & Cooper, G. F.(2014) Counting Markov Blanket Structures. arXiv:1407.2483 [Cs, Stat].
- Wainwright, M. J., & Jordan, M. I.(2008) Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1–305. DOI.
- Weiss, Y. (2000) Correctness of Local Probability Propagation in Graphical Models with Loops. Neural Computation, 12(1), 1–41. DOI.
- Weiss, Y., & Freeman, W. T.(2001) Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology. Neural Computation, 13(10), 2173–2200. DOI.
- Winn, J. M., & Bishop, C. M.(2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
- Wright, S. (1934) The Method of Path Coefficients. The Annals of Mathematical Statistics, 5(3), 161–215. DOI.
- Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
- Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. arXiv:1202.3775 [Cs, Stat].