Reproduced from James F Fixx’s puzzle book (Fixx77), found in a recycling bin:

Inferring cause and effect from nature. Graphical models and related techniques for doing it. Avoiding the danger of folk statistics. Observational studies, confounding, adjustment criteria, *d*-separation, identifiability, interventions, moral equivalence…

The most well-trodden path here is using directed graphical models with the additional assumption that \(A\rightarrow B\) may be read as “A causes a change in B”. C&C instrumental variables and propensity score matching. When you are talking Structural Equation models, this boils down to more or less some extra interpretation imposed on hierarchical models. Avoidance of Ecological fallacy/ Simpson’s paradox.

When can I use my crappy observational data, collected without a good experimental design for whatever reason, to do interventional inference? There is a lot of research in this. I should summarise the salient bits for myself. In fact I did; I did a reading group on this.

See also quantum causal graphical models, and the use of classical causal graphical models to eliminate hidden quantum causes.

Spurious correlation induced by sampling bias.

I speculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causal relationships. So confounds really are that common, and since people do not think in DAGs, the imbalance also explains overconfidence.

## Learning materials

Miguel Hernán and Jamie Robins new causal inference book, has a free draft online. See Yanir Seroussi’s review.

Samantha Klinberg has a book notable for its handling for time-dependent causality.

Tutorial: David Sontag and Uri Shalit, Causal inference from observational studies.

Felix Elwert’s summary is punchy. (Elwe13)

Chapter 3 of (some edition of) Pearl’s book is available as an author’s preprint:

## Counterfactuals

TBD.

## Propensity scores

RuWa06 comes recommended by Shalizi as:

A good description of Rubin et al.’s methods for causal inference, adapted to the meanest understanding. […] Rubin and Waterman do a very good job of explaining, in a clear and concrete problem, just how and why the newer techniques of causal inference are valuable, with just enough technical detail that it doesn’t seem like magic.

## Causal Graph inference from data

Uh oh. You don’t know what causes what? Or specifically, you can’t eliminate a whole bunch of potential causal arrows *a priori*? Much more work.

Here is a seminar I noticed on this theme, which is also a lightspeed introduction to some difficulties.

Guido Consonni,

Objective Bayes Model Selection of Gaussian Essential Graphs with Observational and Interventional Data.Graphical models based on Directed Acyclic Graphs (DAGs) represent a powerful tool for investigating dependencies among variables. It is well known that one cannot distinguish between DAGs encoding the same set of conditional independencies (Markov equivalent DAGs) using only observational data. However, the space of all DAGs can be partitioned into Markov equivalence classes, each being represented by a unique Essential Graph (EG), also called Completed Partially Directed Graph (CPDAG). In some fields, in particular genomics, one can have both observational and interventional data, the latter being produced after an exogenous perturbation of some variables in the system, or from randomized intervention experiments. Interventions destroy the original causal structure, and modify the Markov property of the underlying DAG, leading to a finer partition of DAGs into equivalence classes, each one being represented by an Interventional Essential Graph (I-EG) (Hauser and Buehlmann). In this talk we consider Bayesian model selection of EGs under the assumption that the variables are jointly Gaussian. In particular, we adopt an objective Bayes approach, based on the notion of fractional Bayes factor, and obtain a closed form expression for the marginal likelihood of an EG. Next we construct a Markov chain to explore the EG space under a sparsity constraint, and propose an MCMC algorithm to approximate the posterior distribution over the space of EGs. Our methodology, which we name Objective Bayes Essential graph Search (OBES), allows to evaluate the inferential uncertainty associated to any features of interest, for instance the posterior probability of edge inclusion. An extension of OBES to deal simultaneously with observational and interventional data is also presented: this involves suitable modifications of the likelihood and prior, as well as of the MCMC algorithm. We conclude by presenting results for simulated and real experiments (protein-signaling data).

This is joint work with Federico Castelletti, Stefano Peluso and Marco Della Vedova (Universita’ Cattolica del Sacro Cuore).

## Causal time series DAGS

As with other time series methods, has its own issues.

TODO: find out how this works: Causal impact. (Based on BGKR15_.)

The CausalImpact R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred.

## Refs

- KiPe83: Jin H. Kim, Judea Pearl (1983) A computational model for causal and diagnostic reasoning in inference systems. In IJCAI (Vol. 83, pp. 190–193). Citeseer
- MaCo13: Marloes H. Maathuis, Diego Colombo (2013) A generalized backdoor criterion.
*ArXiv Preprint ArXiv:1307.5636*. - GuFZ14: Jiaying Gu, Fei Fu, Qing Zhou (2014) Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data.
*ArXiv:1403.2310 [Stat]*. - KRPH17: Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, Bernhard Schölkopf (2017) Avoiding Discrimination through Causal Reasoning.
*ArXiv:1706.02744 [Cs, Stat]*. - MiMo07: Lilyana Mihalkova, Raymond J. Mooney (2007) Bottom-up learning of Markov logic network structure. In Proceedings of the 24th international conference on Machine learning (pp. 625–632). ACM
- TZAK18: Ruibo Tu, Cheng Zhang, Paul Ackermann, Hedvig Kjellström, Kun Zhang (2018) Causal discovery in the presence of missing data.
*ArXiv:1807.04010 [Cs, Stat]*. - LSMS17: Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, Max Welling (2017) Causal Effect Inference with Deep Latent-Variable Models. In Advances in Neural Information Processing Systems 30 (pp. 6446–6456). Curran Associates, Inc.
- BaPe16: Elias Bareinboim, Judea Pearl (2016) Causal inference and the data-fusion problem.
*Proceedings of the National Academy of Sciences*, 113(27), 7345–7352. DOI - Laur00: Steffen L. Lauritzen (2000) Causal inference from graphical models. In Complex stochastic systems (pp. 63–107). CRC Press
- YPHS16: Pranjul Yadav, Lisiane Prunelli, Alexander Hoff, Michael Steinbach, Bonnie Westra, Vipin Kumar, Gyorgy Simon (2016) Causal Inference in Observational Data.
*ArXiv:1611.04660 [Cs, Stat]*. - Pear09a: Judea Pearl (2009a) Causal inference in statistics: An overview.
*Statistics Surveys*, 3, 96–146. DOI - PeBM15: Jonas Peters, Peter Bühlmann, Nicolai Meinshausen (2015) Causal inference using invariant prediction: identification and confidence intervals.
*ArXiv:1501.01332 [Stat]*. - ShTc14: Ilya Shpitser, Eric Tchetgen Tchetgen (2014) Causal Inference with a Graphical Hierarchy of Interventions.
*ArXiv:1411.2127 [Stat]*. - DEMS10: David K. Duvenaud, Daniel Eaton, Kevin P. Murphy, Mark W. Schmidt (2010) Causal learning without DAGs. In NIPS Causality: Objectives and Assessment (pp. 177–190).
- ChLP18: Rafael Chaves, Gabriela Barreto Lemos, Jacques Pienaar (2018) Causal Modeling the Delayed-Choice Experiment.
*Physical Review Letters*, 120(19), 190401. DOI - Dide00: Vanessa Didelez (n.d.) Causal Reasoning for Events in Continuous Time: A Decision–Theoretic Approach.
- Bühl13: Peter Bühlmann (2013) Causal statistical inference in high dimensions.
*Mathematical Methods of Operations Research*, 77(3), 357–370. DOI - Gelm10: Andrew Gelman (2010) Causality and statistical learning.
*American Journal of Sociology*, 117(3), 955–966. DOI - ChWh12: Karim Chalak, Halbert White (2012) Causality, conditional independence, and graphical separation in settable systems.
*Neural Computation*, 24(7), 1611–1668. - Pear09b: Judea Pearl (2009b)
*Causality: Models, Reasoning and Inference*. Cambridge University Press - SpGS01: Peter Spirtes, Clark Glymour, Richard Scheines (2001)
*Causation, Prediction, and Search*. The MIT Press - Mess12: Franz H. Messerli (2012) Chocolate Consumption, Cognitive Function, and Nobel Laureates.
*New England Journal of Medicine*, 367(16), 1562–1564. DOI - ShPe08: Ilya Shpitser, Judea Pearl (2008) Complete identification methods for the causal hierarchy.
*The Journal of Machine Learning Research*, 9, 1941–1979. - SMFP15: Bernhard Schölkopf, Krikamol Muandet, Kenji Fukumizu, Jonas Peters (2015) Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations.
*ArXiv:1501.06794 [Cs, Stat]*. - ArCS99: Barry C. Arnold, Enrique Castillo, Jose M. Sarabia (1999)
*Conditional specification of statistical models*. Springer Science & Business Media - BüRK13: Peter Bühlmann, Philipp Rütimann, Markus Kalisch (2013) Controlling false positive selections in high-dimensional regression and causal inference.
*Statistical Methods in Medical Research*, 22(5), 466–492. - ShMc16: Cosma Rohilla Shalizi, Edward McFowland III (2016) Controlling for Latent Homophily in Social Networks through Inferring Latent Locations.
*ArXiv:1607.06565 [Physics, Stat]*. - ViCo14: Shyam Visweswaran, Gregory F. Cooper (2014) Counting Markov Blanket Structures.
*ArXiv:1407.2483 [Cs, Stat]*. - DeWR11: Xavier De Luna, Ingeborg Waernbaum, Thomas S. Richardson (2011) Covariate selection for the nonparametric estimation of an average treatment effect.
*Biometrika*, asr041. DOI - EnHS13: Doris Entner, Patrik Hoyer, Peter Spirtes (2013) Data-driven covariate selection for nonparametric estimation of causal effects. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (pp. 256–264).
- SmEi08: David A. Smith, Jason Eisner (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
- Ragi11: M. Raginsky (2011) Directed information and Pearl’s causal calculus. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 958–965). DOI
- LNCS16: David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou (2016) Discovering Causal Signals in Images.
*ArXiv:1605.08179 [Cs, Stat]*. - ArMS09: Sinan Aral, Lev Muchnik, Arun Sundararajan (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks.
*Proceedings of the National Academy of Sciences*, 106(51), 21544–21549. DOI - KaBü07: Markus Kalisch, Peter Bühlmann (2007) Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm.
*Journal of Machine Learning Research*, 8, 613–636. - MaKB09: Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann (2009) Estimating high-dimensional intervention effects from observational data.
*The Annals of Statistics*, 37(6A), 3133–3164. DOI - RuWa06: Donald B Rubin, Richard P Waterman (2006) Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology.
*Statistical Science*, 21(2), 206–222. DOI - GLLM19: Zhi Geng, Yue Liu, Chunchen Liu, Wang Miao (2019) Evaluation of Causal Effects and Local Structure Learning of Causal Networks.
*Annual Review of Statistics and Its Application*, 6(1), 103–124. DOI - PeBa14: Judea Pearl, Elias Bareinboim (2014) External Validity: From Do-Calculus to Transportability Across Populations.
*Statistical Science*, 29(4), 579–595. DOI - Pear86: Judea Pearl (1986) Fusion, propagation, and structuring in belief networks.
*Artificial Intelligence*, 29(3), 241–288. DOI - Fixx77: James F Fixx (1977)
*Games for the superintelligent*. London: Muller - Eich01: Michael Eichler (2001) Granger-causality graphs for multivariate time series.
*Granger-Causality Graphs for Multivariate Time Series*. - Elwe13: Felix Elwert (2013) Graphical causal models. In Handbook of causal analysis for social research (pp. 245–273). Springer
- Laur96: Steffen L. Lauritzen (1996)
*Graphical Models*. Clarendon Press - JoWe02a: Michael I. Jordan, Yair Weiss (2002a) Graphical models: Probabilistic inference.
*The Handbook of Brain Theory and Neural Networks*, 490–496. - BüKM14: Peter Bühlmann, Markus Kalisch, Lukas Meier (2014) High-Dimensional Statistics with a View Toward Applications in Biology.
*Annual Review of Statistics and Its Application*, 1(1), 255–278. DOI - ShTh11: Cosma Rohilla Shalizi, Andrew C. Thomas (2011) Homophily and Contagion Are Generically Confounded in Observational Social Network Studies.
*Sociological Methods & Research*, 40(2), 211–239. DOI - BGKR15: Kay H. Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, Steven L. Scott (2015) Inferring causal impact using Bayesian structural time-series models.
*The Annals of Applied Statistics*, 9(1), 247–274. DOI - ZPJS12: Kun Zhang, Jonas Peters, Dominik Janzing, Bernhard Schölkopf (2012) Kernel-based Conditional Independence Test and Application in Causal Discovery.
*ArXiv:1202.3775 [Cs, Stat]*. - BLZS15: Adam Bloniarz, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, Bin Yu (2015) Lasso adjustments of treatment effect estimates in randomized experiments.
*ArXiv:1507.03652 [Math, Stat]*. - NeOt04: Richard E. Neapolitan, others (2004)
*Learning bayesian networks*(Vol. 38). Prentice Hall Upper Saddle River - HiOB05: Geoffrey E. Hinton, Simon Osindero, Kejie Bao (2005) Learning causally linked markov random fields. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 128–135). Citeseer
- TeIL15: Johannes Textor, Alexander Idelberger, Maciej Liśkiewicz (2015) Learning from Pairwise Marginal Independencies.
*ArXiv:1508.00280 [Cs]*. - CMKR12: Diego Colombo, Marloes H. Maathuis, Markus Kalisch, Thomas S. Richardson (2012) Learning high-dimensional directed acyclic graphs with latent and selection variables.
*The Annals of Statistics*, 40(1), 294–321. - Jord99: Michael Irwin Jordan (1999)
*Learning in graphical models*. Cambridge, Mass.: MIT Press - ArGZ17: Bryon Aragam, Jiaying Gu, Qing Zhou (2017) Learning Large-Scale Bayesian Networks with the sparsebn Package.
*ArXiv:1703.04025 [Cs, Stat]*. - FuZh13: Fei Fu, Qing Zhou (2013) Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.
*Journal of the American Statistical Association*, 108(501), 288–300. DOI - Mont11: Andrea Montanari (2011)
*Lecture Notes for Stat 375 Inference in Graphical Models* - LaSp88: S. L. Lauritzen, D. J. Spiegelhalter (1988) Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems.
*Journal of the Royal Statistical Society. Series B (Methodological)*, 50(2), 157–224. - Murp12: Kevin P. Murphy (2012)
*Machine Learning: A Probabilistic Perspective*. Cambridge, MA: The MIT Press - ErBü14: Jan Ernest, Peter Bühlmann (2014) Marginal integration for fully robust causal inference.
*ArXiv:1405.1868 [Stat]*. - BCCC17: Mohammad Taha Bahadori, Krzysztof Chalupka, Edward Choi, Robert Chen, Walter F. Stewart, Jimeng Sun (2017) Neural Causal Regularization under the Independence of Mechanisms Assumption.
*ArXiv:1702.02604 [Cs, Stat]*. - KoKS19: Ulrich Kohler, Frauke Kreuter, Elizabeth A. Stuart (2019) Nonprobability Sampling and Causal Analysis.
*Annual Review of Statistics and Its Application*, 6(1), 149–172. DOI - VaBC12: Stijn Vansteelandt, Maarten Bekaert, Gerda Claeskens (2012) On model selection and model misspecification in causal inference.
*Statistical Methods in Medical Research*, 21(1), 7–30. DOI - BPEM14: Peter Bühlmann, Jonas Peters, Jan Ernest, Marloes Maathuis (2014)
*Predicting causal effects in high-dimensional settings* - MCKB10: Marloes H. Maathuis, Diego Colombo, Markus Kalisch, Peter Bühlmann (2010) Predicting causal effects in large-scale systems from observational data.
*Nature Methods*, 7(4), 247–248. DOI - KoFr09: Daphne Koller, Nir Friedman (2009)
*Probabilistic graphical models : principles and techniques*. Cambridge, MA: MIT Press - JoWe02b: Michael I. Jordan, Yair Weiss (2002b) Probabilistic inference in graphical models.
*Handbook of Neural Networks and Brain Theory*. - Pear08: Judea Pearl (2008)
*Probabilistic reasoning in intelligent systems: networks of plausible inference*. San Francisco, Calif: Kaufmann - ClMH14: Tom Claassen, Joris M. Mooij, Tom Heskes (2014) Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013).
*ArXiv:1411.1557 [Stat]*. - ABHL17: John-Mark A. Allen, Jonathan Barrett, Dominic C. Horsman, Ciarán M. Lee, Robert W. Spekkens (2017) Quantum Common Causes and Quantum Causal Models.
*Physical Review X*, 7(3), 031021. DOI - BaTP14: Elias Bareinboim, Jin Tian, Judea Pearl (2014) Recovering from Selection Bias in Causal and Statistical Inference. In AAAI (pp. 2410–2416).
- ChPe12: B Chen, J Pearl (2012) Regression and causation: A critical examination of econometric textbooks
- MPSM10: Daniel Marbach, Robert J. Prill, Thomas Schaffter, Claudio Mattiussi, Dario Floreano, Gustavo Stolovitzky (2010) Revealing strengths and weaknesses of methods for gene network inference.
*Proceedings of the National Academy of Sciences*, 107(14), 6286–6291. DOI - Pear82: Judea Pearl (1982) Reverend Bayes on inference engines: a distributed hierarchical approach. In in Proceedings of the National Conference on Artificial Intelligence (pp. 133–136).
- Kenn15: Edward H. Kennedy (2015) Semiparametric theory and empirical processes in causal inference.
*ArXiv Preprint ArXiv:1510.04740*. - BPSM16: Stephan Bongers, Jonas Peters, Bernhard Schölkopf, Joris M. Mooij (2016) Structural Causal Models: Cycles, Marginalizations, Exogenous Reparametrizations and Reductions.
*ArXiv:1611.06221 [Cs, Stat]*. - Wrig34: Sewall Wright (1934) The Method of Path Coefficients.
*The Annals of Mathematical Statistics*, 5(3), 161–215. DOI - NoNy11: Hans Noel, Brendan Nyhan (2011) The “unfriending” problem: The consequences of homophily in friendship retention for causal estimates of social influence.
*Social Networks*, 33(3), 211–218. DOI - ABGM17: Massil Achab, Emmanuel Bacry, Stéphane Gaïffas, Iacopo Mastromatteo, Jean-Francois Muzy (2017) Uncovering Causality from Multivariate Hawkes Integrated Cumulants. In PMLR.
- YeFW03: J.S. Yedidia, W.T. Freeman, Y. Weiss (2003) Understanding Belief Propagation and Its Generalizations. In Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
- SaVa13: Brian Sauer, Tyler J. VanderWeele (2013)
*Use of Directed Acyclic Graphs*. Agency for Healthcare Research and Quality (US)