The Living Thing / Notebooks : Directed graphical models

a.k.a Bayesian Belief networks, Directed graphical models, feedforward neural networks or hierarchical structural models.

Graphs of conditional, directed independence are a convenient formalism for many models.

What’s special here is how we handle independence relations and reasoning about them. In one sense there is nothing special about graphical models; it’s just a graph of which variables are conditionally independent of which others. On the other hand, that graph is a powerful analytic tool, telling you what is confounded with what, and when. Moreover, you can use conditional independence tests to construct that graph even without necessarily constructing the whole model (e.g. ZPJS12).

Once you have the graph, you can infer more detailed relations than mere conditional dependence or otherwise; this is precisely that hierarchical models emphasise.

These can even be causal graphical models, and when we can infer those we are extracting Science (ONO) from observational data. This is really interesting; see causal graphical models

distinction between DAG and Markov graphs

Introductory reading

People recommend me Koller and Friedman, which includes many different flavours of DAG model and many different methods, (KoFr09) but I personally didn’t like it. It drowned me in details without motivation, and left me feeling drained yet uninformed. YMMV.

Spirtes et al (SpGS01) and Pearl (Pear08) are readable. I’ve had Lauritzen (Laur96) recommended too, but haven’t looked at it yet. Murphy’s textbook (Murp12) has a minimal introduction intermixed with some related models, with a more ML, more Bayesian formalism.

Graph inference from data

Much more work.

Learning these models turns out to need a conditional independence test, an awareness of multiple testing and graphs.

Implementation

Oooh! look! Software!

Refs

ArAZ15
Aragam, B., Amini, A. A., & Zhou, Q. (2015) Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression. arXiv:1511.08963 [Cs, Math, Stat].
ArGZ17
Aragam, B., Gu, J., & Zhou, Q. (2017) Learning Large-Scale Bayesian Networks with the sparsebn Package. arXiv:1703.04025 [Cs, Stat].
ArZh15
Aragam, B., & Zhou, Q. (2015) Concave Penalized Estimation of Sparse Gaussian Bayesian Networks. Journal of Machine Learning Research, 16, 2273–2328.
ArMS09
Aral, S., Muchnik, L., & Sundararajan, A. (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. DOI.
ArCS99
Arnold, B. C., Castillo, E., & Sarabia, J. M.(1999) Conditional specification of statistical models. . Springer Science & Business Media
BaTP14
Bareinboim, E., Tian, J., & Pearl, J. (2014) Recovering from Selection Bias in Causal and Statistical Inference. In AAAI (pp. 2410–2416).
Beal03
Beal, M. J.(2003) Variational algorithms for approximate Bayesian inference. . University of London
BLZS15
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. arXiv:1507.03652 [Math, Stat].
BüKM14
Bühlmann, P., Kalisch, M., & Meier, L. (2014) High-Dimensional Statistics with a View Toward Applications in Biology. Annual Review of Statistics and Its Application, 1(1), 255–278. DOI.
BüRK13
Bühlmann, P., Rütimann, P., & Kalisch, M. (2013) Controlling false positive selections in high-dimensional regression and causal inference. Statistical Methods in Medical Research, 22(5), 466–492.
ChPe12
Chen, B., & Pearl, J. (2012) Regression and causation: A critical examination of econometric textbooks.
ChFo07
Christakis, N. A., & Fowler, J. H.(2007) The Spread of Obesity in a Large Social Network over 32 Years. New England Journal of Medicine, 357(4), 370–379. DOI.
CMKR12
Colombo, D., Maathuis, M. H., Kalisch, M., & Richardson, T. S.(2012) Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40(1), 294–321.
DeWR11
De Luna, X., Waernbaum, I., & Richardson, T. S.(2011) Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika, asr041. DOI.
EdAn15
Edwards, D., & Ankinakatte, S. (2015) Context-specific graphical models for discrete longitudinal data. Statistical Modelling, 15(4), 301–325. DOI.
Fixx77
Fixx, J. F.(1977) Games for the superintelligent. . London: Muller
FrJo05
Frey, B. J., & Jojic, N. (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9), 1392–1416. DOI.
GuFZ14
Gu, J., Fu, F., & Zhou, Q. (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data. arXiv:1403.2310 [Stat].
JGJS99
Jordan, Michael I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K.(1999) An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2), 183–233. DOI.
JoWe02a
Jordan, Michael I., & Weiss, Y. (2002a) Graphical models: Probabilistic inference. The Handbook of Brain Theory and Neural Networks, 490–496.
JoWe02b
Jordan, Michael I., & Weiss, Y. (2002b) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory.
Jord99
Jordan, Michael Irwin. (1999) Learning in graphical models. . Cambridge, Mass.: MIT Press
KaBü07
Kalisch, M., & Bühlmann, P. (2007) Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
KoFr09
Koller, D., & Friedman, N. (2009) Probabilistic graphical models : principles and techniques. . Cambridge, MA: MIT Press
KrGu09
Krause, A., & Guestrin, C. (2009) Optimal value of information in graphical models. J. Artif. Int. Res., 35(1), 557–591.
LaSp88
Lauritzen, S. L., & Spiegelhalter, D. J.(1988) Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224.
Laur96
Lauritzen, Steffen L. (1996) Graphical Models. . Clarendon Press
MaCo13
Maathuis, M. H., & Colombo, D. (2013) A generalized backdoor criterion. arXiv Preprint arXiv:1307.5636.
MaJW06
Malioutov, D. M., Johnson, J. K., & Willsky, A. S.(2006) Walk-Sums and Belief Propagation in Gaussian Graphical Models. Journal of Machine Learning Research, 7, 2031—2064.
MPSM10
Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., & Stolovitzky, G. (2010) Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences, 107(14), 6286–6291. DOI.
MiMo07
Mihalkova, L., & Mooney, R. J.(2007) Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on Machine learning (pp. 625–632). ACM
Mont11
Montanari, A. (2011) Lecture Notes for Stat 375 Inference in Graphical Models.
Murp12
Murphy, K. P.(2012) Machine Learning: A Probabilistic Perspective. (1 edition.). Cambridge, MA: The MIT Press
NeOt04
Neapolitan, R. E., & others. (2004) Learning bayesian networks. (Vol. 38). Prentice Hall Upper Saddle River
Pear82
Pearl, J. (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. In in Proceedings of the National Conference on Artificial Intelligence (pp. 133–136).
Pear86
Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
Pear08
Pearl, J. (2008) Probabilistic reasoning in intelligent systems: networks of plausible inference. (Rev. 2. print., 12. [Dr.].). San Francisco, Calif: Kaufmann
PeQB05
Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005) Nonlinear multivariate analysis of neurophysiological signals. Progress in Neurobiology, 77(1–2), 1–37.
Poll04
Pollard, D. (2004) Hammersley-Clifford theorem for Markov random fields.
RaFN08
Rabbat, M. G., Figueiredo, Má. A. T., & Nowak, R. D.(2008) Network Inference from Co-Occurrences. IEEE Transactions on Information Theory, 54(9), 4053–4068. DOI.
Schm10
Schmidt, M. (2010) Graphical model structure learning with l1-regularization. . UNIVERSITY OF BRITISH COLUMBIA
ShMc16
Shalizi, C. R., & McFowland III, E. (2016) Controlling for Latent Homophily in Social Networks through Inferring Latent Locations. arXiv:1607.06565 [Physics, Stat].
SmEi08
Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
SpGS01
Spirtes, P., Glymour, C., & Scheines, R. (2001) Causation, Prediction, and Search. (Second Edition.). The MIT Press
StVe98
Studený, M., & Vejnarová, J. (1998) On multiinformation function as a tool for measuring stochastic dependence. In Learning in graphical models (pp. 261–297). Cambridge, Mass.: MIT Press
SuWL12
Su, R.-Q., Wang, W.-X., & Lai, Y.-C. (2012) Detecting hidden nodes in complex networks from time series. Phys. Rev. E, 85(6), 065201. DOI.
TeIL15
Textor, J., Idelberger, A., & Liśkiewicz, M. (2015) Learning from Pairwise Marginal Independencies. arXiv:1508.00280 [Cs].
ViCo14
Visweswaran, S., & Cooper, G. F.(2014) Counting Markov Blanket Structures. arXiv:1407.2483 [Cs, Stat].
WaJo08
Wainwright, M. J., & Jordan, M. I.(2008) Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1–305. DOI.
Weis00
Weiss, Y. (2000) Correctness of Local Probability Propagation in Graphical Models with Loops. Neural Computation, 12(1), 1–41. DOI.
WeFr01
Weiss, Y., & Freeman, W. T.(2001) Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology. Neural Computation, 13(10), 2173–2200. DOI.
WiBi05
Winn, J. M., & Bishop, C. M.(2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
Wrig34
Wright, S. (1934) The Method of Path Coefficients. The Annals of Mathematical Statistics, 5(3), 161–215. DOI.
YeFW03
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
ZPJS12
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. arXiv:1202.3775 [Cs, Stat].