The Living Thing / Notebooks :

Causal graphical models

when graphs are handy

Reproduced from James F Fixx’s puzzle book, found in a recycling bin (Fixx77)


Directed graphical models with the additional assumption that \(A\rightarrow B\) may be read as “A causes B”.

Observational studies, confounding, adjustment criteria, d-separation, confounding, identifiability, interventions…

When can I use my crappy observational data, collected without a good experimental design for whatever reason, to do interventional inference? There is a lot of research in this. I should summarise the salient bits for myself. In fact I did; I just did a reading group on this. See also quantum causal graphical models.

Tutorials online

Tutorial: David Sontag and Uri Shalit, Causal inference from observational studies.

Felix Elwert’s summary is punchy. (Elwe13)

Chapter 3 of (some edition of) Pearl’s book is availalbe as an author’s preprint: Parts 1, 2, 3, 4, 5, 6.


Propensity scores

RuWa06 comes recommended by Shalizi as:

A good description of Rubin et al.’s methods for causal inference, adapted to the meanest understanding. […] Rubin and Waterman do a very good job of explaining, in a clear and concrete problem, just how and why the newer techniques of causal inference are valuable, with just enough technical detail that it doesn’t seem like magic.

Causal Graph inference from data

Uh oh. You don’t know what causes what? Or specifically, you can’t eliminate a whole bunch of potential causal arrows a priori? Much more work.

Causal time series DAGS

As with other time series methods, has its own issues.


How does Granger causality relate?


Aragam, B., Gu, J., & Zhou, Q. (2017) Learning Large-Scale Bayesian Networks with the sparsebn Package. arXiv:1703.04025 [Cs, Stat].
Aral, S., Muchnik, L., & Sundararajan, A. (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. DOI.
Arnold, B. C., Castillo, E., & Sarabia, J. M.(1999) Conditional specification of statistical models. . Springer Science & Business Media
Ay, N., & Polani, D. (2008) Information flows in causal networks. Advances in Complex Systems (ACS), 11(01), 17–41. DOI.
Bahadori, M. T., Chalupka, K., Choi, E., Chen, R., Stewart, W. F., & Sun, J. (2017) Neural Causal Regularization under the Independence of Mechanisms Assumption. arXiv:1702.02604 [Cs, Stat].
Bareinboim, E., & Pearl, J. (2016) Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113(27), 7345–7352. DOI.
Bareinboim, E., Tian, J., & Pearl, J. (2014) Recovering from Selection Bias in Causal and Statistical Inference. In AAAI (pp. 2410–2416).
Beal, M. J.(2003) Variational algorithms for approximate Bayesian inference. . University of London
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. arXiv:1507.03652 [Math, Stat].
Bongers, S., Peters, J., Schölkopf, B., & Mooij, J. M.(2016) Structural Causal Models: Cycles, Marginalizations, Exogenous Reparametrizations and Reductions. arXiv:1611.06221 [Cs, Stat].
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L.(2015) Inferring causal impact using Bayesian structural time-series models. The Annals of Applied Statistics, 9(1), 247–274. DOI.
Bühlmann, P. (2013) Causal statistical inference in high dimensions. Mathematical Methods of Operations Research, 77(3), 357–370.
Bühlmann, P., Kalisch, M., & Meier, L. (2014) High-Dimensional Statistics with a View Toward Applications in Biology. Annual Review of Statistics and Its Application, 1(1), 255–278. DOI.
Bühlmann, P., Peters, J., Ernest, J., & Maathuis, M. (2014) Predicting causal effects in high-dimensional settings.
Bühlmann, P., Rütimann, P., & Kalisch, M. (2013) Controlling false positive selections in high-dimensional regression and causal inference. Statistical Methods in Medical Research, 22(5), 466–492.
Chen, B., & Pearl, J. (2012) Regression and causation: A critical examination of econometric textbooks.
Claassen, T., Mooij, J. M., & Heskes, T. (2014) Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013). arXiv:1411.1557 [Stat].
Colombo, D., Maathuis, M. H., Kalisch, M., & Richardson, T. S.(2012) Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40(1), 294–321.
De Luna, X., Waernbaum, I., & Richardson, T. S.(2011) Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika, asr041. DOI.
Didelez, V. (n.d.) Causal Reasoning for Events in Continuous Time: A Decision–Theoretic Approach.
Duvenaud, D. K., Eaton, D., Murphy, K. P., & Schmidt, M. W.(2010) Causal learning without DAGs. In NIPS Causality: Objectives and Assessment (pp. 177–190).
Eichler, M. (2001) Granger-causality graphs for multivariate time series. Granger-Causality Graphs for Multivariate Time Series.
Elwert, F. (2013) Graphical causal models. In Handbook of causal analysis for social research (pp. 245–273). Springer
Entner, D., Hoyer, P., & Spirtes, P. (2013) Data-driven covariate selection for nonparametric estimation of causal effects. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (pp. 256–264).
Ernest, J., & Bühlmann, P. (2014) Marginal integration for fully robust causal inference. arXiv:1405.1868 [Stat].
Fixx, J. F.(1977) Games for the superintelligent. . London: Muller
Fu, F., & Zhou, Q. (2013) Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent. Journal of the American Statistical Association, 108(501), 288–300. DOI.
Gelman, A. (2010) Causality and statistical learning. American Journal of Sociology, 117(3), 955–966. DOI.
Gu, J., Fu, F., & Zhou, Q. (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data. arXiv:1403.2310 [Stat].
Hinton, G. E., Osindero, S., & Bao, K. (2005) Learning causally linked markov random fields. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 128–135). Citeseer
Jordan, Michael Irwin. (1999) Learning in graphical models. . Cambridge, Mass.: MIT Press
Jordan, Michael I., & Weiss, Y. (2002a) Graphical models: Probabilistic inference. The Handbook of Brain Theory and Neural Networks, 490–496.
Jordan, Michael I., & Weiss, Y. (2002b) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory.
Kalisch, M., & Bühlmann, P. (2007) Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
Kennedy, E. H.(2015) Semiparametric theory and empirical processes in causal inference. arXiv Preprint arXiv:1510.04740.
Kim, J. H., & Pearl, J. (1983) A computational model for causal and diagnostic reasoning in inference systems. In IJCAI (Vol. 83, pp. 190–193). Citeseer
Koller, D., & Friedman, N. (2009) Probabilistic graphical models : principles and techniques. . Cambridge, MA: MIT Press
Lauritzen, S. L., & Spiegelhalter, D. J.(1988) Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224.
Lauritzen, Steffen L. (1996) Graphical Models. . Clarendon Press
Lauritzen, Steffen L. (2000) Causal inference from graphical models. In Complex stochastic systems (pp. 63–107). CRC Press
Lopez-Paz, D., Nishihara, R., Chintala, S., Schölkopf, B., & Bottou, L. (2016) Discovering Causal Signals in Images. arXiv:1605.08179 [Cs, Stat].
Maathuis, M. H., & Colombo, D. (2013) A generalized backdoor criterion. arXiv Preprint arXiv:1307.5636.
Maathuis, M. H., Colombo, D., Kalisch, M., & Bühlmann, P. (2010) Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4), 247–248. DOI.
Maathuis, M. H., Kalisch, M., & Bühlmann, P. (2009) Estimating high-dimensional intervention effects from observational data. The Annals of Statistics, 37(6A), 3133–3164. DOI.
Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., & Stolovitzky, G. (2010) Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences, 107(14), 6286–6291. DOI.
Messerli, F. H.(2012) Chocolate Consumption, Cognitive Function, and Nobel Laureates. New England Journal of Medicine, 367(16), 1562–1564. DOI.
Mihalkova, L., & Mooney, R. J.(2007) Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on Machine learning (pp. 625–632). ACM
Montanari, A. (2011) Lecture Notes for Stat 375 Inference in Graphical Models.
Murphy, K. P.(2012) Machine Learning: A Probabilistic Perspective. (1 edition.). Cambridge, MA: The MIT Press
Neapolitan, R. E., & others. (2004) Learning bayesian networks. (Vol. 38). Prentice Hall Upper Saddle River
Noel, H., & Nyhan, B. (2011) The “unfriending” problem: The consequences of homophily in friendship retention for causal estimates of social influence. Social Networks, 33(3), 211–218. DOI.
Pearl, J. (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. In in Proceedings of the National Conference on Artificial Intelligence (pp. 133–136).
Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
Pearl, J. (2008) Probabilistic reasoning in intelligent systems: networks of plausible inference. (Rev. 2. print., 12. [Dr.].). San Francisco, Calif: Kaufmann
Pearl, J. (2009a) Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146. DOI.
Pearl, J. (2009b) Causality: Models, Reasoning and Inference. . Cambridge University Press
Pearl, J., & Bareinboim, E. (2014) External Validity: From Do-Calculus to Transportability Across Populations. Statistical Science, 29(4), 579–595. DOI.
Peters, J., Bühlmann, P., & Meinshausen, N. (2015) Causal inference using invariant prediction: identification and confidence intervals. arXiv:1501.01332 [Stat].
Raginsky, M. (2011) Directed information and Pearl’s causal calculus. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 958–965). DOI.
Rubin, D. B., & Waterman, R. P.(2006) Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology. Statistical Science, 21(2), 206–222. DOI.
Sauer, B., & VanderWeele, T. J.(2013) Use of Directed Acyclic Graphs. . Agency for Healthcare Research and Quality (US)
Schmidt, M. (2010) Graphical model structure learning with l1-regularization. . UNIVERSITY OF BRITISH COLUMBIA
Schölkopf, B., Muandet, K., Fukumizu, K., & Peters, J. (2015) Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations. arXiv:1501.06794 [Cs, Stat].
Shalizi, C. R., & McFowland III, E. (2016) Controlling for Latent Homophily in Social Networks through Inferring Latent Locations. arXiv:1607.06565 [Physics, Stat].
Shalizi, C. R., & Thomas, A. C.(2011) Homophily and Contagion Are Generically Confounded in Observational Social Network Studies. Sociological Methods & Research, 40(2), 211–239. DOI.
Shpitser, I., & Pearl, J. (2008) Complete identification methods for the causal hierarchy. The Journal of Machine Learning Research, 9, 1941–1979.
Shpitser, I., & Tchetgen, E. T.(2014) Causal Inference with a Graphical Hierarchy of Interventions. arXiv:1411.2127 [Stat].
Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
Spirtes, P., Glymour, C., & Scheines, R. (2001) Causation, Prediction, and Search. (Second Edition.). The MIT Press
Textor, J., Idelberger, A., & Liśkiewicz, M. (2015) Learning from Pairwise Marginal Independencies. arXiv:1508.00280 [Cs].
Vansteelandt, S., Bekaert, M., & Claeskens, G. (2012) On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21(1), 7–30. DOI.
Visweswaran, S., & Cooper, G. F.(2014) Counting Markov Blanket Structures. arXiv:1407.2483 [Cs, Stat].
Wright, S. (1934) The Method of Path Coefficients. The Annals of Mathematical Statistics, 5(3), 161–215. DOI.
Yadav, P., Prunelli, L., Hoff, A., Steinbach, M., Westra, B., Kumar, V., & Simon, G. (2016) Causal Inference in Observational Data. arXiv:1611.04660 [Cs, Stat].
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. arXiv:1202.3775 [Cs, Stat].