Learning the graph structure, not just the clique potentials.
Much more work.
Learning these models turns out to need a conditional independence test, an awareness of multiple testing and graphs.
bnlearn learns belief networks

A new R package for learning sparse Bayesian networks and other graphical models from highdimensional data via sparse regularization. Designed from the ground up to handle:
 Experimental data with interventions
 Mixed observational / experimental data
 Highdimensional data with p >> n
 Datasets with thousands of variables (tested up to p=8000)
 Continuous and discrete data
The emphasis of this package is scalability and statistical consistency on highdimensional datasets. […] For more details on this package, including worked examples and the methodological background, please see our new preprint [1].
Overview
The main methods for learning graphical models are:
 estimate.dag for directed acyclic graphs (Bayesian networks).
 estimate.precision for undirected graphs (Markov random fields).
 estimate.covariance for covariance matrices.
Currently, estimation of precision and covariances matrices is limited to Gaussian data.
Nonparanormal skeptic (TBD.)
skggm (python) does the gaussian thing but also has a nice sparsification and very good explanation.
For machine vision: mrfregistration:
Drop is a software for deformable image registration using discrete optimization. Its purpose is to provide an easytouse graphical user interface for dense image and volume registration. The application of this software is intended to be in the field of medical imaging but not restricted to this domain.
Refs
 BaMo12
 Bayati, M., & Montanari, A. (2012) The LASSO Risk for Gaussian Matrices. IEEE Transactions on Information Theory, 58(4), 1997–2017. DOI.
 BüGe11
 Bühlmann, P., & van de Geer, S. (2011) Statistics for HighDimensional Data: Methods, Theory and Applications. (2011 edition.). Heidelberg ; New York: Springer
 Bunt96
 Buntine, W. L.(1996) A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8(2), 195–210. DOI.
 CMKR12
 Colombo, D., Maathuis, M. H., Kalisch, M., & Richardson, T. S.(2012) Learning highdimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40(1), 294–321.
 CoBa17
 Cox, D. R., & Battey, H. S.(2017) Large numbers of explanatory variables, a semidescriptive analysis. Proceedings of the National Academy of Sciences, 114(32), 8592–8595. DOI.
 DrMa17
 Drton, M., & Maathuis, M. H.(2017) Structure Learning in Graphical Modeling. Annual Review of Statistics and Its Application, 4(1), 365–393. DOI.
 FrHT08
 Friedman, J., Hastie, T., & Tibshirani, R. (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. DOI.
 FuZh13
 Fu, F., & Zhou, Q. (2013) Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent. Journal of the American Statistical Association, 108(501), 288–300. DOI.
 GoWD10
 Gogate, V., Webb, W., & Domingos, P. (2010) Learning efficient Markov networks. In Advances in Neural Information Processing Systems (pp. 748–756).
 HaLB15
 Hallac, D., Leskovec, J., & Boyd, S. (2015) Network Lasso: Clustering and Optimization in Large Graphs. ArXiv:1507.00280 [Cs, Math, Stat]. DOI.
 HaDr13
 Harris, N., & Drton, M. (2013) PC Algorithm for Nonparanormal Graphical Models. Journal of Machine Learning Research, 14(1), 3365–3383.
 HiOB05
 Hinton, G. E., Osindero, S., & Bao, K. (2005) Learning causally linked markov random fields. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 128–135). Citeseer
 JuQM17
 Jung, A., Quang, N. T., & Mara, A. (2017) When is Network Lasso Accurate?. ArXiv:1704.02107 [Stat].
 Khos12
 Khoshgnauz, E. (2012) Learning Markov Network Structure using Brownian Distance Covariance. ArXiv:1206.6361 [Cs, Stat].
 KoDV17
 Kocaoglu, M., Dimakis, A., & Vishwanath, S. (2017) CostOptimal Learning of Causal Graphs. In PMLR (pp. 1875–1884).
 KrSB09
 Krämer, N., Schäfer, J., & Boulesteix, A.L. (2009) Regularized estimation of largescale gene association networks using graphical Gaussian models. BMC Bioinformatics, 10(1), 384. DOI.
 LeGK06
 Lee, S.I., Ganapathi, V., & Koller, D. (2006) Efficient Structure Learning of Markov Networks using $ L_1 $Regularization. In Advances in neural Information processing systems (pp. 817–824). MIT Press
 LHYL12
 Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012) The Nonparanormal SKEPTIC. ArXiv:1206.6488 [Cs, Stat].
 LiLW09
 Liu, H., Lafferty, J., & Wasserman, L. (2009) The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research, 10, 2295–2328.
 MaHa12
 Mazumder, R., & Hastie, T. (2012) The graphical lasso: New insights and alternatives. Electronic Journal of Statistics, 6, 2125–2149. DOI.
 Mont12
 Montanari, A. (2012) Graphical models concepts in compressed sensing. Compressed Sensing: Theory and Applications, 394–438.
 NeOt04
 Neapolitan, R. E., & others. (2004) Learning bayesian networks. (Vol. 38). Prentice Hall Upper Saddle River
 RGSG17
 Ramsey, J., Glymour, M., SanchezRomero, R., & Glymour, C. (2017) A million variables and more: the Fast Greedy Equivalence Search algorithm for learning highdimensional graphical causal models, with an application to functional magnetic resonance images. International Journal of Data Science and Analytics, 3(2), 121–129. DOI.
 ScMB14
 Schelldorfer, J., Meier, L., & Bühlmann, P. (2014) GLMMLasso: An Algorithm for HighDimensional Generalized Linear Mixed Models Using ℓ1Penalization. Journal of Computational and Graphical Statistics, 23(2), 460–477. DOI.
 Schm10
 Schmidt, M. (2010) Graphical model structure learning with l1regularization. . UNIVERSITY OF BRITISH COLUMBIA
 TeIL15
 Textor, J., Idelberger, A., & Liśkiewicz, M. (2015) Learning from Pairwise Marginal Independencies. ArXiv:1508.00280 [Cs].
 Geer14
 van de Geer, S. (2014) Worst possible subdirections in highdimensional models. In arXiv:1403.7023 [math, stat] (Vol. 131).
 WuSN12
 Wu, R., Srikant, R., & Ni, J. (2012) Learning graph structures in discrete Markov random fields. In INFOCOM Workshops (pp. 214–219).
 ZLRL12
 Zhao, T., Liu, H., Roeder, K., Lafferty, J., & Wasserman, L. (2012) The Huge Package for Highdimensional Undirected Graph Estimation in R. Journal of Machine Learning Research : JMLR, 13, 1059–1062.