Learning the graph structure, not just the clique potentials.
Much more work.
Learning these models turns out to need a conditional independence test, an awareness of multiple testing and graphs.
 bnlearn learns belief networks

A new R package for learning sparse Bayesian networks and other graphical models from highdimensional data via sparse regularization. Designed from the ground up to handle:
 Experimental data with interventions
 Mixed observational / experimental data
 Highdimensional data with p >> n
 Datasets with thousands of variables (tested up to p=8000)
 Continuous and discrete data
The emphasis of this package is scalability and statistical consistency on highdimensional datasets. […] For more details on this package, including worked examples and the methodological background, please see our new preprint.
Overview
The main methods for learning graphical models are:
 estimate.dag for directed acyclic graphs (Bayesian networks).
 estimate.precision for undirected graphs (Markov random fields).
 estimate.covariance for covariance matrices.
Currently, estimation of precision and covariances matrices is limited to Gaussian data.
 Nonparanormal skeptic (TBD.)
skggm (python) does the gaussian thing but also has a nice sparsification and good explanation.
Refs
 Bunt96: W.L. Buntine (1996) A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8(2), 195–210. DOI
 RGSG17: Joseph Ramsey, Madelyn Glymour, Ruben SanchezRomero, Clark Glymour (2017) A million variables and more: the Fast Greedy Equivalence Search algorithm for learning highdimensional graphical causal models, with an application to functional magnetic resonance images. International Journal of Data Science and Analytics, 3(2), 121–129. DOI
 KoDV17: Murat Kocaoglu, Alex Dimakis, Sriram Vishwanath (2017) CostOptimal Learning of Causal Graphs. In PMLR (pp. 1875–1884).
 LeGK06: SuIn Lee, Varun Ganapathi, Daphne Koller (2006) Efficient Structure Learning of Markov Networks using \(L_1\)Regularization. In Advances in neural Information processing systems (pp. 817–824). MIT Press
 ScMB14: Jürg Schelldorfer, Lukas Meier, Peter Bühlmann (2014) GLMMLasso: An Algorithm for HighDimensional Generalized Linear Mixed Models Using ℓ1Penalization. Journal of Computational and Graphical Statistics, 23(2), 460–477. DOI
 Cai17: T. Tony Cai (2017) Global Testing and LargeScale Multiple Testing for HighDimensional Covariance Structures. Annual Review of Statistics and Its Application, 4(1), 423–446. DOI
 Mont12: Andrea Montanari (2012) Graphical models concepts in compressed sensing. Compressed Sensing: Theory and Applications, 394–438.
 CoBa17: D. R. Cox, H. S. Battey (2017) Large numbers of explanatory variables, a semidescriptive analysis. Proceedings of the National Academy of Sciences, 114(32), 8592–8595. DOI
 NeOt04: Richard E. Neapolitan, others (2004) Learning bayesian networks (Vol. 38). Prentice Hall Upper Saddle River
 HiOB05: Geoffrey E. Hinton, Simon Osindero, Kejie Bao (2005) Learning causally linked markov random fields. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 128–135). Citeseer
 GoWD10: Vibhav Gogate, William Webb, Pedro Domingos (2010) Learning efficient Markov networks. In Advances in Neural Information Processing Systems (pp. 748–756).
 TeIL15: Johannes Textor, Alexander Idelberger, Maciej Liśkiewicz (2015) Learning from Pairwise Marginal Independencies. ArXiv:1508.00280 [Cs].
 WuSN12: Rui Wu, R. Srikant, Jian Ni (2012) Learning graph structures in discrete Markov random fields. In INFOCOM Workshops (pp. 214–219).
 CMKR12: Diego Colombo, Marloes H. Maathuis, Markus Kalisch, Thomas S. Richardson (2012) Learning highdimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40(1), 294–321.
 Khos12: Ehsan Khoshgnauz (2012) Learning Markov Network Structure using Brownian Distance Covariance. ArXiv:1206.6361 [Cs, Stat].
 FuZh13: Fei Fu, Qing Zhou (2013) Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent. Journal of the American Statistical Association, 108(501), 288–300. DOI
 HaLB15: David Hallac, Jure Leskovec, Stephen Boyd (2015) Network Lasso: Clustering and Optimization in Large Graphs. ArXiv:1507.00280 [Cs, Math, Stat]. DOI
 HaDr13: Naftali Harris, Mathias Drton (2013) PC Algorithm for Nonparanormal Graphical Models. Journal of Machine Learning Research, 14(1), 3365–3383.
 KrSB09: Nicole Krämer, Juliane Schäfer, AnneLaure Boulesteix (2009) Regularized estimation of largescale gene association networks using graphical Gaussian models. BMC Bioinformatics, 10(1), 384. DOI
 FrHT08: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. DOI
 BüGe11: Peter Bühlmann, Sara van de Geer (2011) Statistics for HighDimensional Data: Methods, Theory and Applications. Heidelberg ; New York: Springer
 DrMa17: Mathias Drton, Marloes H. Maathuis (2017) Structure Learning in Graphical Modeling. Annual Review of Statistics and Its Application, 4(1), 365–393. DOI
 MKGT12: Vikash Mansinghka, Charles Kemp, Thomas Griffiths, Joshua Tenenbaum (2012) Structured Priors for Structure Learning. ArXiv:1206.6852.
 MaHa12: Rahul Mazumder, Trevor Hastie (2012) The graphical lasso: New insights and alternatives. Electronic Journal of Statistics, 6, 2125–2149. DOI
 ZLRL12: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman (2012) The Huge Package for Highdimensional Undirected Graph Estimation in R. Journal of Machine Learning Research : JMLR, 13, 1059–1062.
 BaMo12: M. Bayati, A. Montanari (2012) The LASSO Risk for Gaussian Matrices. IEEE Transactions on Information Theory, 58(4), 1997–2017. DOI
 LiLW09: Han Liu, John Lafferty, Larry Wasserman (2009) The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research, 10, 2295–2328.
 LHYL12: Han Liu, Fang Han, Ming Yuan, John Lafferty, Larry Wasserman (2012) The Nonparanormal SKEPTIC. ArXiv:1206.6488 [Cs, Stat].
 JuQM17: Alexander Jung, Nguyen Tran Quang, Alexandru Mara (2017) When is Network Lasso Accurate? ArXiv:1704.02107 [Stat].
 Geer14: Sara van de Geer (2014) Worst possible subdirections in highdimensional models. In arXiv:1403.7023 [math, stat] (Vol. 131).