Learning graphical models from data

Also, causal discovery, structure discovery

September 20, 2017 — October 1, 2022

algebra

graphical models

machine learning

networks

probability

statistics

Learning the independence graph structure from data in a graphical model. A particular sparse model selection problem where the model is hierarchical, or possibly even non-directional. For bonus sophistication we might even try to learn causal effects from raw data.

There are a few ways we can learn graphical models. The obvious one, to my mind, is to use a Bayesian network to learn the structure. conditional independence test, an awareness of multiple testing and graph theory. But also Bayesian sampling from possible graph structures is a thing apparently. There are other approaches too.

1 Bayesian learning of Bayesian networks

I am indebted to Dario Draca for introducing this field to me. To quote Dario:

I think the introduction to this paper (Scutari, Graafland, and Gutiérrez 2019) gives an accessible description of the Bayesian approach to Bayesian network structure learning, including how to define parameter priors for discrete and Gaussian networks (the former case being potentially relevant to your signal detection/classification problem).

If you are particularly interested, the introduction to this paper (Scutari 2018) also gives some more information on computing marginal likelihoods (i.e. likelihood of data conditional only on the graph structure) for discrete Bayesian networks.

This one (Kuipers, Moffa, and Heckerman 2014a) gives a rigorous derivation of the marginal graph likelihood for Gaussian networks:

I also wonder about this seminar:

Guido Consonni, Objective Bayes Model Selection of Gaussian Essential Graphs with Observational and Interventional Data.

Graphical models based on Directed Acyclic Graphs (DAGs) represent a powerful tool for investigating dependencies among variables. It is well known that one cannot distinguish between DAGs encoding the same set of conditional independencies (Markov equivalent DAGs) using only observational data. However, the space of all DAGs can be partitioned into Markov equivalence classes, each being represented by a unique Essential Graph (EG), also called Completed Partially Directed Graph (CPDAG). In some fields, in particular genomics, one can have both observational and interventional data, the latter being produced after an exogenous perturbation of some variables in the system, or from randomized intervention experiments. Interventions destroy the original causal structure, and modify the Markov property of the underlying DAG, leading to a finer partition of DAGs into equivalence classes, each one being represented by an Interventional Essential Graph (I-EG) (Hauser and Buehlmann). In this talk we consider Bayesian model selection of EGs under the assumption that the variables are jointly Gaussian. In particular, we adopt an objective Bayes approach, based on the notion of fractional Bayes factor, and obtain a closed form expression for the marginal likelihood of an EG. Next we construct a Markov chain to explore the EG space under a sparsity constraint, and propose an MCMC algorithm to approximate the posterior distribution over the space of EGs. Our methodology, which we name Objective Bayes Essential graph Search (OBES), allows to evaluate the inferential uncertainty associated to any features of interest, for instance the posterior probability of edge inclusion. An extension of OBES to deal simultaneously with observational and interventional data is also presented: this involves suitable modifications of the likelihood and prior, as well as of the MCMC algorithm.

2 Classic methods using independence tests on graphs

Many. In my time in the lectures of Marloes Maathuis I learnt some of the theory, but TBH everything has fallen out my head now, since I have not used them in practice. Notable works are Colombo and Maathuis (2014);Drton and Maathuis (2017);Heinze-Deml, Maathuis, and Meinshausen (2018);Maathuis, Kalisch, and Bühlmann (2009);Maathuis et al. (2010).

Most of these seem to boil down to the cases where belief propagation is well-behaved, to wit, linear-Gaussian and discrete RVs. In these, dependence is simple (in the Gaussian case, essentially, two things are independence if their shared entry in the cross-precision matrix is zero). If we can find a way of estimating actual zeros in that matrix, then what? Often we want causal distributions, so the question remains: can we convert the implied undirected graph into a directed one? tl;dr: sometimes. This is one of those cases where the Bayesian method comes out much cleaner; averaging over possible graphs is a natural way of thinking about this.

For a less causal/intervention focused method, see Nonparanormal skeptic (🏗) (H. Liu et al. 2012) which combines semiparametric regression with non-parametric graph inference.

Figure 2: Researcher inferring optimal graph surgery

3 Learning by continuous optimization

A new trick in the arsenal from those neural network nerds. Xun Zheng, Bryon Aragam and Chen Dan in their blog post Learning DAGs with Continuous Optimization introduce NO-TEARS. This is an interesting bit of work AFAICT. Download from xunzheng/notears, and read the papers (Zheng et al. 2018; Zheng et al. 2020):

Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree.

Key insight:

…that the k-th power of the adjacency matrix of a graph counts the number of k-step paths from one node to another. In other words, if the diagonal of the matrix power turns out to be all zeros, there [are] no k-step cycles in the graph. Then to characterize acyclicity, we just need to set this constraint for all k=1,2,…,d, eliminating cycles of all possible length.

Has this gone anywhere?

A related approach is Lorch et al. (2021):

Continuous characterization of acyclic graphs. Orthogonal to the work on Bayesian inference, Zheng et al. (2018); have recently proposed a differentiable characterization of acyclic graphs for structure learning. In this work, we adopt the formulation of Yu et al. (2019), who show that a graph with adjacency matrix $\mathbf{G} \in\{0,1\}^{d \times d}$ does not have any cycles if and only if $h(\mathbf{G})=0$, where \[ h(\mathbf{G}):=\operatorname{tr}\left[\left(\mathbf{I}+\frac{1}{d} \mathbf{G}\right)^{d}\right]-d . \]

NO-BEARS (H.-C. Lee et al. 2019; Zhu et al. 2020) does more tricks and approximations to make NO-TEARS more scalable, by upper-bounding the spectral radius. (HT Dario Draca for mentioning this.) Connection to orthonormal matrix wrangling.

4 Causal discovery from time series

A particular sub-flavour. Very popular, and weird. Accordingly see the graphical models from time series page.

5 Tools

5.1 DIBS

larslorch/dibs: Joint Bayesian inference of graph and parameters of general Bayes nets

This is the Python JAX implementation for DiBS (Lorch et al. 2021), a fully differentiable method for joint Bayesian inference of the DAG and parameters of general, causal Bayesian networks.

In this implementation, DiBS inference is performed with the particle variational inference method SVGD (Q. Liu and Wang 2019). Since DiBS and SVGD operate on continuous tensors and solely rely on Monte Carlo estimation and gradient ascent-like updates, the inference code leverages efficient vectorized operations, automatic differentiation, just-in-time compilation, and hardware acceleration, fully implemented with JAX.

Documentation.

Their code example is impressive:

In this example, we use DiBS to generate 10 DAG and parameter samples from the joint posterior over Gaussian Bayes nets with means modeled by neural networks.

from dibs.inference import JointDiBS
from dibs.target import make_nonlinear_gaussian_model
import jax.random as random
key = random.PRNGKey(0)

# simulate some data
key, subk = random.split(key)
data, model = make_nonlinear_gaussian_model(key=subk, n_vars=20)

# sample 10 DAG and parameter particles from the joint posterior
dibs = JointDiBS(x=data.x, inference_model=model)
key, subk = random.split(key)
gs, thetas = dibs.sample(key=subk, n_particles=10, steps=1000)

In the above, the keyword argument x for JointDiBS is a matrix of shape [N, d] and could be any real-world data set.

Cripes.

5.2 bnlearn

bnlearn learns belief networks, i.e. directed graphical models. It is restricted to the classic exact belief propgation case, i.e. multinomial or gaussian models.

bnlearn implements the following constraint-based structure learning algorithms:

PC (the stable version);

Grow-Shrink (GS);

Incremental Association Markov Blanket (IAMB);

Fast Incremental Association (Fast-IAMB);

Interleaved Incremental Association (Inter-IAMB);

Incremental Association with FDR Correction (IAMB-FDR);

Max-Min Parents & Children (MMPC);

Semi-Interleaved Hiton-PC (SI-HITON-PC);

Hybrid Parents & Children (HPC);

the following score-based structure learning algorithms:

Hill Climbing (HC);

Tabu Search (Tabu);

the following hybrid structure learning algorithms:

Max-Min Hill Climbing (MMHC);

Hybrid HPC (H2PC);

General 2-Phase Restricted Maximization (RSMAX2);

the following local discovery algorithms:

Chow-Liu;

ARACNE;

and the following Bayesian network classifiers:

naive Bayes;

Tree-Augmented naive Bayes (TAN).

Discrete (multinomial) and continuous (multivariate normal) data sets are supported, both for structure and parameter learning. The latter can be performed using either maximum likelihood or Bayesian estimators.

5.3 sparsebn

Compare with sparsebn:

A new R package for learning sparse Bayesian networks and other graphical models from high-dimensional data via sparse regularization. Designed from the ground up to handle:

Experimental data with interventions

Mixed observational / experimental data

High-dimensional data with p >> n

Datasets with thousands of variables (tested up to p=8000)

Continuous and discrete data

The emphasis of this package is scalability and statistical consistency on high-dimensional datasets. […] For more details on this package, including worked examples and the methodological background, please see our new preprint.

Overview

The main methods for learning graphical models are:

estimate.dag for directed acyclic graphs (Bayesian networks).

estimate.precision for undirected graphs (Markov random fields).

estimate.covariance for covariance matrices.

Currently, estimation of precision and covariances matrices is limited to Gaussian data.

5.4 Causalnex

quantumblacklabs/causalnex: A Python library that helps data scientists to infer causation rather than observing correlation.

CausalNex is a Python library that uses Bayesian Networks to combine machine learning and domain expertise for causal reasoning. You can use CausalNex to uncover structural relationships in your data, learn complex distributions, and observe the effect of potential interventions.

5.5 caus2e

MLResearchAtOSRAM/cause2e: The cause2e package provides tools for performing an end-to-end causal analysis of your data.

The main contribution of cause2e is the integration of two established causal packages that have currently been separated and cumbersome to combine:

Causal discovery methods from the py-causal package, which is a Python wrapper around parts of the Java TETRAD software. It provides many algorithms for learning the causal graph from data and domain knowledge.

Causal reasoning methods from the DoWhy package, which is the current standard for the steps of a causal analysis starting from a known causal graph and data

5.6 TETRAD

TETRAD (source, tutorial) is a tool for discovering and visualising and calculating giant empirical DAGs, including general graphical inference and causality. It’s written by eminent causality inference people.

Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is freeware that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform. …

The Tetrad programs describe causal models in three distinct parts or stages: a picture, representing a directed graph specifying hypothetical causal relations among the variables; a specification of the family of probability distributions and kinds of parameters associated with the graphical model; and a specification of the numerical values of those parameters.

py-causal is a wrapper around this for python, and R-causal for R.

5.7 skgmm

skggm (python) does the Gaussian thing but also has a nice sparsification and good explanation.

The core estimator provided in skggm is QuicGraphLasso which is a scikit-learn compatible interface to QUIC, a proximal Newton-type algorithm that solves the graphical lasso (2) objective.

6 Causeme

CauseMe - A platform to benchmark causal discovery methods (Runge et al. 2019)

Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in such systems present a number of major challenges for causal discovery techniques and it is largely unknown which methods perform best for which challenge.

The CauseMe platform provides ground truth benchmark datasets featuring different real data challenges to assess and compare the performance of causal discovery methods. The available benchmark datasets are either generated from synthetic models mimicking real challenges, or are real world data sets where the causal structure is known with high confidence. The datasets vary in dimensionality, complexity and sophistication.

7 Incoming

FenTechSolutions/CausalDiscoveryToolbox: Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.

8 References

Azadkia, and Chatterjee. 2019. “A Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat].

Bayati, and Montanari. 2012. “The LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory.

Besserve, Mehrjou, Sun, et al. 2019. “Counterfactuals Uncover the Modular Structure of Deep Generative Models.” In arXiv:1812.03253 [Cs, Stat].

Bühlmann, and van de Geer. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications.

Buntine. 1996. “A Guide to the Literature on Learning Probabilistic Networks from Data.” IEEE Transactions on Knowledge and Data Engineering.

Cai. 2017. “Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures.” Annual Review of Statistics and Its Application.

Chau, Ton, González, et al. 2021. “BayesIMP: Uncertainty Quantification for Causal Data Fusion.”

Colombo, and Maathuis. 2014. “Order-Independent Constraint-Based Causal Structure Learning.” J. Mach. Learn. Res.

Colombo, Maathuis, Kalisch, et al. 2012. “Learning High-Dimensional Directed Acyclic Graphs with Latent and Selection Variables.” The Annals of Statistics.

Cox, and Battey. 2017. “Large Numbers of Explanatory Variables, a Semi-Descriptive Analysis.” Proceedings of the National Academy of Sciences.

Dezfouli, Bonilla, and Nock. 2018. “Variational Network Inference: Strong and Stable with Concrete Support.” In.

Drton, and Maathuis. 2017. “Structure Learning in Graphical Modeling.” Annual Review of Statistics and Its Application.

Foygel, and Drton. 2010. “Extended Bayesian Information Criteria for Gaussian Graphical Models.” In Advances in Neural Information Processing Systems 23.

Friedman, Hastie, and Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics.

Fu, and Zhou. 2013. “Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.” Journal of the American Statistical Association.

Gao, Ding, and Aragam. 2020. “A Polynomial-Time Algorithm for Learning Nonparametric Causal Graphs.” arXiv:2006.11970 [Cs, Math, Stat].

Gendron, Witbrock, and Dobbie. 2023. “A Survey of Methods, Challenges and Perspectives in Causality.”

Geng, Liu, Liu, et al. 2019. “Evaluation of Causal Effects and Local Structure Learning of Causal Networks.” Annual Review of Statistics and Its Application.

Gnecco, Meinshausen, Peters, et al. 2021. “Causal Discovery in Heavy-Tailed Models.” The Annals of Statistics.

Gogate, Webb, and Domingos. 2010. “Learning Efficient Markov Networks.” In Advances in Neural Information Processing Systems.

Gu, and Zhou. 2020. “Learning Big Gaussian Bayesian Networks: Partition, Estimation and Fusion.” Journal of Machine Learning Research.

Hallac, Leskovec, and Boyd. 2015. “Network Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat].

Hanly, Brew, Austin, et al. 2023. “Software Application Profile: The Daggle App—a Tool to Support Learning and Teaching the Graphical Rules of Selecting Adjustment Variables Using Directed Acyclic Graphs.” International Journal of Epidemiology.

Harris, and Drton. 2013. “PC Algorithm for Nonparanormal Graphical Models.” Journal of Machine Learning Research.

Heinze-Deml, Maathuis, and Meinshausen. 2018. “Causal Structure Learning.” Annual Review of Statistics and Its Application.

Hinton, Osindero, and Bao. 2005. “Learning Causally Linked Markov Random Fields.” In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics.

Hsieh, Sustik, Dhillon, et al. 2014. “QUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation.” Journal of Machine Learning Research.

Huang, Zhang, Zhang, et al. 2020. “Causal Discovery from Heterogeneous/Nonstationary Data.” Journal of Machine Learning Research.

Janzing, Mooij, Zhang, et al. 2012. “Information-Geometric Approach to Inferring Causal Directions.” Artificial Intelligence.

Jin, Fu, Kang, et al. 2020. “Bayesian Symbolic Regression.” arXiv:1910.08892 [Stat].

Jung, Heckel, Bölcskei, et al. 2013. “Compressive Nonparametric Graphical Model Selection For Time Series.” arXiv:1311.3257 [Stat].

Jung, Quang, and Mara. 2017. “When Is Network Lasso Accurate?” arXiv:1704.02107 [Stat].

Kaddour, Lynch, Liu, et al. 2022. “Causal Machine Learning: A Survey and Open Problems.”

Karimi, Muandet, Kornblith, et al. 2022. “On the Relationship Between Explanation and Prediction: A Causal View.”

Khoshgnauz. 2012. “Learning Markov Network Structure Using Brownian Distance Covariance.” arXiv:1206.6361 [Cs, Stat].

Knowles, Gael, and Ghahramani. n.d. “Message Passing Algorithms for Dirichlet Diﬀusion Trees.”

Kocaoglu, Dimakis, and Vishwanath. 2017. “Cost-Optimal Learning of Causal Graphs.” In PMLR.

Kocaoglu, Snyder, Dimakis, et al. 2017. “CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training.” arXiv:1709.02023 [Cs, Math, Stat].

Krämer, Schäfer, and Boulesteix. 2009. “Regularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics.

Kuipers, Moffa, and Heckerman. 2014a. “Supplement to .. ‘Addendum on the Scoring of Gaussian Directed Acyclic Graphical Models’’ .. Deriving and Simplifying the BGe Score.” The Annals of Statistics.

———. 2014b. “Addendum on the Scoring of Gaussian Directed Acyclic Graphical Models.” The Annals of Statistics.

Lederer. 2016. “Graphical Models for Discrete and Continuous Data.” arXiv:1609.05551 [Math, Stat].

Leeb, Lanzillotta, Annadani, et al. 2021. “Structure by Architecture: Disentangled Representations Without Regularization.” arXiv:2006.07796 [Cs, Stat].

Lee, Hao-Chih, Danieletto, Miotto, et al. 2019. “Scaling Structural Learning with NO-BEARS to Infer Causal Transcriptome Networks.” In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020.

Lee, Su-In, Ganapathi, and Koller. 2006. “Efficient Structure Learning of Markov Networks Using $ L_1 $-Regularization.” In Advances in Neural Information Processing Systems.

Li, Torralba, Anandkumar, et al. 2020. “Causal Discovery in Physical Systems from Videos.” arXiv:2007.00631 [Cs, Stat].

Liu, Han, Han, Yuan, et al. 2012. “The Nonparanormal SKEPTIC.” arXiv:1206.6488 [Cs, Stat].

Liu, Han, Lafferty, and Wasserman. 2009. “The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs.” Journal of Machine Learning Research.

Liu, Qiang, and Wang. 2019. “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” In Advances In Neural Information Processing Systems.

Locatello, Bauer, Lucic, et al. 2019. “Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations.” arXiv:1811.12359 [Cs, Stat].

Locatello, Poole, Raetsch, et al. 2020. “Weakly-Supervised Disentanglement Without Compromises.” In Proceedings of the 37th International Conference on Machine Learning.

Lorch, Rothfuss, Schölkopf, et al. 2021. “DiBS: Differentiable Bayesian Structure Learning.” In.

Lu, Wu, Hernández-Lobato, et al. 2021. “Nonlinear Invariant Risk Minimization: A Causal Approach.” arXiv:2102.12353 [Cs, Stat].

Maathuis, Colombo, Kalisch, et al. 2010. “Predicting Causal Effects in Large-Scale Systems from Observational Data.” Nature Methods.

Maathuis, Kalisch, and Bühlmann. 2009. “Estimating High-Dimensional Intervention Effects from Observational Data.” The Annals of Statistics.

Mansinghka, Kemp, Griffiths, et al. 2012. “Structured Priors for Structure Learning.” arXiv:1206.6852.

Mazumder, and Hastie. 2012. “The graphical lasso: New insights and alternatives.” Electronic Journal of Statistics.

Montanari. 2012. “Graphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications.

Mooij, Peters, Janzing, et al. 2016. “Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks.” Journal of Machine Learning Research.

Nair, Zhu, Savarese, et al. 2019. “Causal Induction from Visual Observations for Goal Directed Tasks.” arXiv:1910.01751 [Cs, Stat].

Narendra, Sankaran, Vijaykeerthy, et al. 2018. “Explaining Deep Learning Models Using Causal Inference.” arXiv:1811.04376 [Cs, Stat].

Nauta, Bucur, and Seifert. 2019. “Causal Discovery with Attention-Based Convolutional Neural Networks.” Machine Learning and Knowledge Extraction.

Neapolitan. 2003. Learning Bayesian Networks.

Ng, Fang, Zhu, et al. 2020. “Masked Gradient-Based Causal Structure Learning.” arXiv:1910.08527 [Cs, Stat].

Ng, Zhu, Chen, et al. 2019. “A Graph Autoencoder Approach to Causal Structure Learning.” In Advances In Neural Information Processing Systems.

Obermeyer, Bingham, Jankowiak, et al. 2020. “Functional Tensors for Probabilistic Programming.” arXiv:1910.10775 [Cs, Stat].

Peters, Mooij, Janzing, et al. 2012. “Identifiability of Causal Graphs Using Functional Models.” arXiv:1202.3757 [Cs, Stat].

Ramsey, Glymour, Sanchez-Romero, et al. 2017. “A Million Variables and More: The Fast Greedy Equivalence Search Algorithm for Learning High-Dimensional Graphical Causal Models, with an Application to Functional Magnetic Resonance Images.” International Journal of Data Science and Analytics.

Roscher, Bohn, Duarte, et al. 2020. “Explainable Machine Learning for Scientific Insights and Discoveries.” IEEE Access.

Runge, Bathiany, Bollt, et al. 2019. “Inferring Causation from Time Series in Earth System Sciences.” Nature Communications.

Schelldorfer, Meier, and Bühlmann. 2014. “GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using ℓ1-Penalization.” Journal of Computational and Graphical Statistics.

Schölkopf. 2022. “Causality for Machine Learning.” In Probabilistic and Causal Inference: The Works of Judea Pearl.

Schölkopf, Locatello, Bauer, et al. 2021. “Toward Causal Representation Learning.” Proceedings of the IEEE.

Scutari. 2018. “Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle.” arXiv:1708.00689 [Math, Stat].

Scutari, Graafland, and Gutiérrez. 2019. “Who Learns Better Bayesian Network Structures: Accuracy and Speed of Structure Learning Algorithms.” arXiv:1805.11908 [Stat].

Sharma, and Kiciman. 2020. “DoWhy: An End-to-End Library for Causal Inference.”

Textor, Idelberger, and Liśkiewicz. 2015. “Learning from Pairwise Marginal Independencies.” arXiv:1508.00280 [Cs].

Tigas, Annadani, Jesson, et al. 2022. “Interventions, Where and How? Experimental Design for Causal Models at Scale.” Advances in Neural Information Processing Systems.

van de Geer. 2014. “Worst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat].

Vowels, Camgoz, and Bowden. 2022. “D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery.” ACM Computing Surveys.

Wang, and Jordan. 2021. “Desiderata for Representation Learning: A Causal Perspective.” arXiv:2109.03795 [Cs, Stat].

Wu, Srikant, and Ni. 2012. “Learning Graph Structures in Discrete Markov Random Fields.” In INFOCOM Workshops.

Yang, Liu, Chen, et al. 2020. “CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models.” arXiv:2004.08697 [Cs, Stat].

Yu, Chen, Gao, et al. 2019. “DAG-GNN: DAG Structure Learning with Graph Neural Networks.” Proceedings of the 36th International Conference on Machine Learning.

Zhang, Ng, Gong, et al. 2022. “Truncated Matrix Power Iteration for Differentiable DAG Learning.”

Zhao, Liu, Roeder, et al. 2012. “The Huge Package for High-Dimensional Undirected Graph Estimation in R.” Journal of Machine Learning Research : JMLR.

Zheng, Aragam, Ravikumar, et al. 2018. “DAGs with NO TEARS: Continuous Optimization for Structure Learning.” In Advances in Neural Information Processing Systems 31.

Zheng, Dan, Aragam, et al. 2020. “Learning Sparse Nonparametric DAGs.” In International Conference on Artificial Intelligence and Statistics.

Zhou, Cong, and Chen. 2017. “Augmentable Gamma Belief Networks.”

Zhu, Pfadler, Wu, et al. 2020. “Efficient and Scalable Structure Learning for Bayesian Networks: Algorithms and Applications.”