# Causal graphical models

The danger of folk statistics. The problems of excluded variables.

Directed graphical models with the additional assumption that $A\rightarrow B$ may be read as “A causes B”.

Observational studies, confounding, adjustment criteria, d-separation, confounding, identifiability, interventions, moral equivalence, identification of hidden variables.

When can I use my crappy observational data, collected without a good experimental design for whatever reason, to do interventional inference? There is a lot of research in this. I should summarise the salient bits for myself. In fact I did; I did a reading group on this. See also quantum causal graphical models, and the use of classical causal graphical models to eliminate hidden quantum causes. “With great spreadsheets comes great responsibility.”

Avoidance of Ecological fallacy in mean-field approximation. Simpson’s paradox.

Spurious correlation induced by sampling bias

“With great spreadsheets comes great responsibility.”

The danger of folk statistics. The problems of excluded variables.

Avoidance of Ecological fallacy in mean-field approximation. Simpson’s paradox.

Spurious correlation induced by sampling bias

## Tutorials online

Tutorial: David Sontag and Uri Shalit, Causal inference from observational studies.

Chapter 3 of (some edition of) Pearl’s book is availalbe as an author’s preprint: Parts 1, 2, 3, 4, 5, 6.

Hmm.

## Propensity scores

RuWa06 comes recommended by Shalizi as:

A good description of Rubin et al.’s methods for causal inference, adapted to the meanest understanding. […] Rubin and Waterman do a very good job of explaining, in a clear and concrete problem, just how and why the newer techniques of causal inference are valuable, with just enough technical detail that it doesn’t seem like magic.

## Causal Graph inference from data

Uh oh. You don’t know what causes what? Or specifically, you can’t eliminate a whole bunch of potential causal arrows a priori? Much more work.

Here is a seminar I noticed on this theme:

Guido Consonni, Università Cattolica del Sacro Cuore, Milano

Objective Bayes Model Selection of Gaussian Essential Graphs with Observational and Interventional Data

Graphical models based on Directed Acyclic Graphs (DAGs) represent a powerful tool for investigating dependencies among variables. It is well known that one cannot distinguish between DAGs encoding the same set of conditional independencies (Markov equivalent DAGs) using only observational data. However, the space of all DAGs can be partitioned into Markov equivalence classes, each being represented by a unique Essential Graph (EG), also called Completed Partially Directed Graph (CPDAG). In some fields, in particular genomics, one can have both observational and interventional data, the latter being produced after an exogenous perturbation of some variables in the system, or from randomized intervention experiments. Interventions destroy the original causal structure, and modify the Markov property of the underlying DAG, leading to a finer partition of DAGs into equivalence classes, each one being represented by an Interventional Essential Graph (I-EG) (Hauser and Buehlmann). In this talk we consider Bayesian model selection of EGs under the assumption that the variables are jointly Gaussian. In particular, we adopt an objective Bayes approach, based on the notion of fractional Bayes factor, and obtain a closed form expression for the marginal likelihood of an EG. Next we construct a Markov chain to explore the EG space under a sparsity constraint, and propose an MCMC algorithm to approximate the posterior distribution over the space of EGs. Our methodology, which we name Objective Bayes Essential graph Search (OBES), allows to evaluate the inferential uncertainty associated to any features of interest, for instance the posterior probability of edge inclusion. An extension of OBES to deal simultaneously with observational and interventional data is also presented: this involves suitable modifications of the likelihood and prior, as well as of the MCMC algorithm. We conclude by presenting results for simulated and real experiments (protein-signaling data).

This is joint work with Federico Castelletti, Stefano Peluso and Marco Della Vedova (Universita’ Cattolica del Sacro Cuore).

## Causal time series DAGS

As with other time series methods, has its own issues.

Does this do it? find out. causal impact. (Based on BGKR15.)

The CausalImpact R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred.

## Questions

How does Granger causality relate?

## Refs

ArGZ17
Aragam, B., Gu, J., & Zhou, Q. (2017) Learning Large-Scale Bayesian Networks with the sparsebn Package. arXiv:1703.04025 [Cs, Stat].
ArMS09
Aral, S., Muchnik, L., & Sundararajan, A. (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. DOI.
ArCS99
Arnold, B. C., Castillo, E., & Sarabia, J. M.(1999) Conditional specification of statistical models. . Springer Science & Business Media
AyPo08
Ay, N., & Polani, D. (2008) Information flows in causal networks. Advances in Complex Systems (ACS), 11(01), 17–41. DOI.
BCCC17
Bahadori, M. T., Chalupka, K., Choi, E., Chen, R., Stewart, W. F., & Sun, J. (2017) Neural Causal Regularization under the Independence of Mechanisms Assumption. arXiv:1702.02604 [Cs, Stat].
BaPe16
Bareinboim, E., & Pearl, J. (2016) Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113(27), 7345–7352. DOI.
BaTP14
Bareinboim, E., Tian, J., & Pearl, J. (2014) Recovering from Selection Bias in Causal and Statistical Inference. In AAAI (pp. 2410–2416).
Beal03
Beal, M. J.(2003) Variational algorithms for approximate Bayesian inference. . University of London
BLZS15
Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J., & Yu, B. (2015) Lasso adjustments of treatment effect estimates in randomized experiments. arXiv:1507.03652 [Math, Stat].
BPSM16
Bongers, S., Peters, J., Schölkopf, B., & Mooij, J. M.(2016) Structural Causal Models: Cycles, Marginalizations, Exogenous Reparametrizations and Reductions. arXiv:1611.06221 [Cs, Stat].
BGKR15
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L.(2015) Inferring causal impact using Bayesian structural time-series models. The Annals of Applied Statistics, 9(1), 247–274. DOI.
Bühl13
Bühlmann, P. (2013) Causal statistical inference in high dimensions. Mathematical Methods of Operations Research, 77(3), 357–370.
BüKM14
Bühlmann, P., Kalisch, M., & Meier, L. (2014) High-Dimensional Statistics with a View Toward Applications in Biology. Annual Review of Statistics and Its Application, 1(1), 255–278. DOI.
BPEM14
Bühlmann, P., Peters, J., Ernest, J., & Maathuis, M. (2014) Predicting causal effects in high-dimensional settings.
BüRK13
Bühlmann, P., Rütimann, P., & Kalisch, M. (2013) Controlling false positive selections in high-dimensional regression and causal inference. Statistical Methods in Medical Research, 22(5), 466–492.
ChPe12
Chen, B., & Pearl, J. (2012) Regression and causation: A critical examination of econometric textbooks.
ClMH14
Claassen, T., Mooij, J. M., & Heskes, T. (2014) Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013). arXiv:1411.1557 [Stat].
CMKR12
Colombo, D., Maathuis, M. H., Kalisch, M., & Richardson, T. S.(2012) Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40(1), 294–321.
DeWR11
De Luna, X., Waernbaum, I., & Richardson, T. S.(2011) Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika, asr041. DOI.
Dide00
Didelez, V. (n.d.) Causal Reasoning for Events in Continuous Time: A Decision–Theoretic Approach.
DEMS10
Duvenaud, D. K., Eaton, D., Murphy, K. P., & Schmidt, M. W.(2010) Causal learning without DAGs. In NIPS Causality: Objectives and Assessment (pp. 177–190).
Eich01
Eichler, M. (2001) Granger-causality graphs for multivariate time series. Granger-Causality Graphs for Multivariate Time Series.
Elwe13
Elwert, F. (2013) Graphical causal models. In Handbook of causal analysis for social research (pp. 245–273). Springer
EnHS13
Entner, D., Hoyer, P., & Spirtes, P. (2013) Data-driven covariate selection for nonparametric estimation of causal effects. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (pp. 256–264).
ErBü14
Ernest, J., & Bühlmann, P. (2014) Marginal integration for fully robust causal inference. arXiv:1405.1868 [Stat].
Fixx77
Fixx, J. F.(1977) Games for the superintelligent. . London: Muller
FuZh13
Fu, F., & Zhou, Q. (2013) Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent. Journal of the American Statistical Association, 108(501), 288–300. DOI.
Gelm10
Gelman, A. (2010) Causality and statistical learning. American Journal of Sociology, 117(3), 955–966. DOI.
GuFZ14
Gu, J., Fu, F., & Zhou, Q. (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data. arXiv:1403.2310 [Stat].
HiOB05
Hinton, G. E., Osindero, S., & Bao, K. (2005) Learning causally linked markov random fields. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 128–135). Citeseer
Jord99
Jordan, Michael Irwin. (1999) Learning in graphical models. . Cambridge, Mass.: MIT Press
JoWe02a
Jordan, Michael I., & Weiss, Y. (2002a) Graphical models: Probabilistic inference. The Handbook of Brain Theory and Neural Networks, 490–496.
JoWe02b
Jordan, Michael I., & Weiss, Y. (2002b) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory.
KaBü07
Kalisch, M., & Bühlmann, P. (2007) Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
Kenn15
Kennedy, E. H.(2015) Semiparametric theory and empirical processes in causal inference. arXiv Preprint arXiv:1510.04740.
KiPe83
Kim, J. H., & Pearl, J. (1983) A computational model for causal and diagnostic reasoning in inference systems. In IJCAI (Vol. 83, pp. 190–193). Citeseer
KoFr09
Koller, D., & Friedman, N. (2009) Probabilistic graphical models : principles and techniques. . Cambridge, MA: MIT Press
LaSp88
Lauritzen, S. L., & Spiegelhalter, D. J.(1988) Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224.
Laur96
Lauritzen, Steffen L. (1996) Graphical Models. . Clarendon Press
Laur00
Lauritzen, Steffen L. (2000) Causal inference from graphical models. In Complex stochastic systems (pp. 63–107). CRC Press
LNCS16
Lopez-Paz, D., Nishihara, R., Chintala, S., Schölkopf, B., & Bottou, L. (2016) Discovering Causal Signals in Images. arXiv:1605.08179 [Cs, Stat].
MaCo13
Maathuis, M. H., & Colombo, D. (2013) A generalized backdoor criterion. arXiv Preprint arXiv:1307.5636.
MCKB10
Maathuis, M. H., Colombo, D., Kalisch, M., & Bühlmann, P. (2010) Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4), 247–248. DOI.
MaKB09
Maathuis, M. H., Kalisch, M., & Bühlmann, P. (2009) Estimating high-dimensional intervention effects from observational data. The Annals of Statistics, 37(6A), 3133–3164. DOI.
MPSM10
Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., & Stolovitzky, G. (2010) Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences, 107(14), 6286–6291. DOI.
Mess12
Messerli, F. H.(2012) Chocolate Consumption, Cognitive Function, and Nobel Laureates. New England Journal of Medicine, 367(16), 1562–1564. DOI.
MiMo07
Mihalkova, L., & Mooney, R. J.(2007) Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on Machine learning (pp. 625–632). ACM
Mont11
Montanari, A. (2011) Lecture Notes for Stat 375 Inference in Graphical Models.
Murp12
Murphy, K. P.(2012) Machine Learning: A Probabilistic Perspective. (1 edition.). Cambridge, MA: The MIT Press
NeOt04
Neapolitan, R. E., & others. (2004) Learning bayesian networks. (Vol. 38). Prentice Hall Upper Saddle River
NoNy11
Noel, H., & Nyhan, B. (2011) The “unfriending” problem: The consequences of homophily in friendship retention for causal estimates of social influence. Social Networks, 33(3), 211–218. DOI.
Pear82
Pearl, J. (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. In in Proceedings of the National Conference on Artificial Intelligence (pp. 133–136).
Pear86
Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
Pear08
Pearl, J. (2008) Probabilistic reasoning in intelligent systems: networks of plausible inference. (Rev. 2. print., 12. [Dr.].). San Francisco, Calif: Kaufmann
Pear09a
Pearl, J. (2009a) Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146. DOI.
Pear09b
Pearl, J. (2009b) Causality: Models, Reasoning and Inference. . Cambridge University Press
PeBa14
Pearl, J., & Bareinboim, E. (2014) External Validity: From Do-Calculus to Transportability Across Populations. Statistical Science, 29(4), 579–595. DOI.
PeBM15
Peters, J., Bühlmann, P., & Meinshausen, N. (2015) Causal inference using invariant prediction: identification and confidence intervals. arXiv:1501.01332 [Stat].
Ragi11
Raginsky, M. (2011) Directed information and Pearl’s causal calculus. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 958–965). DOI.
RuWa06
Rubin, D. B., & Waterman, R. P.(2006) Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology. Statistical Science, 21(2), 206–222. DOI.
SaVa13
Sauer, B., & VanderWeele, T. J.(2013) Use of Directed Acyclic Graphs. . Agency for Healthcare Research and Quality (US)
Schm10
Schmidt, M. (2010) Graphical model structure learning with l1-regularization. . UNIVERSITY OF BRITISH COLUMBIA
SMFP15
Schölkopf, B., Muandet, K., Fukumizu, K., & Peters, J. (2015) Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations. arXiv:1501.06794 [Cs, Stat].
ShMc16
Shalizi, C. R., & McFowland III, E. (2016) Controlling for Latent Homophily in Social Networks through Inferring Latent Locations. arXiv:1607.06565 [Physics, Stat].
ShTh11
Shalizi, C. R., & Thomas, A. C.(2011) Homophily and Contagion Are Generically Confounded in Observational Social Network Studies. Sociological Methods & Research, 40(2), 211–239. DOI.
ShPe08
Shpitser, I., & Pearl, J. (2008) Complete identification methods for the causal hierarchy. The Journal of Machine Learning Research, 9, 1941–1979.
ShTc14
Shpitser, I., & Tchetgen, E. T.(2014) Causal Inference with a Graphical Hierarchy of Interventions. arXiv:1411.2127 [Stat].
SmEi08
Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
SpGS01
Spirtes, P., Glymour, C., & Scheines, R. (2001) Causation, Prediction, and Search. (Second Edition.). The MIT Press
TeIL15
Textor, J., Idelberger, A., & Liśkiewicz, M. (2015) Learning from Pairwise Marginal Independencies. arXiv:1508.00280 [Cs].
VaBC12
Vansteelandt, S., Bekaert, M., & Claeskens, G. (2012) On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21(1), 7–30. DOI.
ViCo14
Visweswaran, S., & Cooper, G. F.(2014) Counting Markov Blanket Structures. arXiv:1407.2483 [Cs, Stat].
Wrig34
Wright, S. (1934) The Method of Path Coefficients. The Annals of Mathematical Statistics, 5(3), 161–215. DOI.
YPHS16
Yadav, P., Prunelli, L., Hoff, A., Steinbach, M., Westra, B., Kumar, V., & Simon, G. (2016) Causal Inference in Observational Data. arXiv:1611.04660 [Cs, Stat].
YeFW03
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
ZPJS12
Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery. arXiv:1202.3775 [Cs, Stat].