# Message passing algorithms

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧 🚧 🚧

I see these popping up in graphical model inference, in time-series, and in variational approximation and compressed sensing. What are they?

The grandparent idea seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc. Although I gather from passing reference that many popoular algorithms also happen to be message-passing-type ones.

Now there are many flavours, such as “Gaussian” and “approximate”. Apparently this definition subsumes such diverse models as the Viterbi and Baum-Welch algorithms, among others. See WaJo08 for an overview.

Anyway, what the hell are these things?

The recipe to make a message-passing algorithm has four steps:

• Pick an approximating family for q to be chosen from. For example, the set of fully-factorized distributions, the set of Gaussians, the set of k-component mixtures, etc.

• Pick a divergence measure to minimize. For example, mean-field methods minimize the Kullback-Leibler divergence $$KL(q \| p)$$, expectation propagation minimizes $$KL(p \| q)$$, and power EP minimizes α-divergence, $$D\alpha(p \| q)$$.

• Construct an optimization algorithm for the chosen divergence measure and approximating family. Usually this is a fixed-point iteration obtained by setting the gradients to zero.

• Distribute the optimization across the network, by dividing the network p into factors, and minimizing local divergence at each factor.

There is an overview lecture by Thomas Orton, which connects this with statistical mechanics of statistics.

Last week, we saw how certain computational problems like 3SAT exhibit a thresholding behavior, similar to a phase transition in a physical system. In this post, we’ll continue to look at this phenomenon by exploring a heuristic method, belief propagation (and the cavity method), which has been used to make hardness conjectures, and also has thresholding properties. In particular, we’ll start by looking at belief propagation for approximate inference on sparse graphs as a purely computational problem. After doing this, we’ll switch perspectives and see belief propagation motivated in terms of Gibbs free energy minimization for physical systems. With these two perspectives in mind, we’ll then try to use belief propagation to do inference on the the stochastic block model. We’ll see some heuristic techniques for determining when BP succeeds and fails in inference, as well as some numerical simulation results of belief propagation for this problem. Lastly, we’ll talk about where this all fits into what is currently known about efficient algorithms and information theoretic barriers for the stochastic block model.

GAMP:

Generalized Approximate Message Passing (GAMP) is an approximate, but computationally efficient method for estimation problems with linear mixing. In the linear mixing problem an unknown vector, $$\mathbf{x}$$, with independent components, is first passed through linear transform $$\mathbf{z}=\mathbf{A}\mathbf{x}$$ and then observed through a general probabilistic, componentwise measurement channel to yield a measurement vector $$\mathbf{y}$$. The problem is to estimate $$\mathbf{x}$$ and $$\mathbf{z}$$ from $$\mathbf{y}$$ and $$\mathbf{A}$$. This problem arises in a range of applications including compressed sensing.

Optimal solutions to linear mixing estimation problems are, in general, computationally intractable as the complexity of most brute force algorithms grows exponentially in the dimension of the vector $$\mathbf{x}$$. GAMP approximately performs the estimation through a Gaussian approximation of loopy belief propagation that reduces the vector-valued estimation problem to a sequence of scalar estimation problems on the components of the vectors $$\mathbf{x}$$ and $$\mathbf{z}$$. The GAMP methodology may also have applications to problems with structured sparsity and low-rank matrix factorization

# Refs

Bayati, Mohsen, and Andrea Montanari. 2011. “The Dynamics of Message Passing on Dense Graphs, with Applications to Compressed Sensing.” IEEE Transactions on Information Theory 57 (2): 764–85. https://doi.org/10.1109/TIT.2010.2094817.

Blake, Andrew, Pushmeet Kohli, and Carsten Rother, eds. 2011. Markov Random Fields for Vision and Image Processing. Cambridge, Mass: MIT Press. https://mitpress.mit.edu/books/markov-random-fields-vision-and-image-processing.

Borgerding, Mark, and Philip Schniter. 2016. “Onsager-Corrected Deep Networks for Sparse Linear Inverse Problems,” December. http://arxiv.org/abs/1612.01183.

Cevher, Volkan, Marco F. Duarte, Chinmay Hegde, and Richard Baraniuk. 2009. “Sparse Signal Recovery Using Markov Random Fields.” In Advances in Neural Information Processing Systems, 257–64. Curran Associates, Inc. http://papers.nips.cc/paper/3487-sparse-signal-recovery-using-markov-random-fields.

Dehaene, Guillaume P. 2016. “Expectation Propagation Performs a Smoothed Gradient Descent,” December. http://arxiv.org/abs/1612.05053.

Donoho, David L., A. Maleki, and A. Montanari. 2010. “Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction.” In 2010 IEEE Information Theory Workshop (ITW), 1–5. https://doi.org/10.1109/ITWKSPS.2010.5503193.

Donoho, David L., Arian Maleki, and Andrea Montanari. 2009a. “Message-Passing Algorithms for Compressed Sensing.” Proceedings of the National Academy of Sciences 106 (45): 18914–9. https://doi.org/10.1073/pnas.0909892106.

———. 2009b. “Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation.” In 2010 IEEE Information Theory Workshop (ITW), 1–5. https://doi.org/10.1109/ITWKSPS.2010.5503228.

Jaggi, Martin, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, and Michael I Jordan. 2014. “Communication-Efficient Distributed Dual Coordinate Ascent.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3068–76. Curran Associates, Inc. http://papers.nips.cc/paper/5599-communication-efficient-distributed-dual-coordinate-ascent.pdf.

Ma, Chenxin, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtárik, and Martin Takáč. 2015. “Adding Vs. Averaging in Distributed Primal-Dual Optimization,” February. http://arxiv.org/abs/1502.03508.

Malioutov, Dmitry M., Jason K. Johnson, and Alan S. Willsky. 2006. “Walk-Sums and Belief Propagation in Gaussian Graphical Models.” Journal of Machine Learning Research 7 (October): 2031–64. http://jmlr.csail.mit.edu/papers/v7/malioutov06a.html.

Minka, Tom P. 2005. “Divergence Measures and Message Passing.” Technical report, Microsoft Research. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.314.3445&rep=rep1&type=pdf.

———. 2008. “EP: A Quick Reference.” Techincal Report. http://research-srv.microsoft.com/en-us/um/people/minka/papers/ep/minka-ep-quickref.pdf.

Pearl, Judea. 1986. “Fusion, Propagation, and Structuring in Belief Networks.” Artificial Intelligence 29 (3): 241–88. https://doi.org/10.1016/0004-3702(86)90072-X.

Schniter, P., and S. Rangan. 2012. “Compressive Phase Retrieval via Generalized Approximate Message Passing.” In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 815–22. https://doi.org/10.1109/Allerton.2012.6483302.

Smith, David A., and Jason Eisner. 2008. “Dependency Parsing by Belief Propagation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 145–56. Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1613737.

Sutton, Charles, and Tom P Minka. 2006. “Local Training and Belief Propagation.” Technical Report TR-2006-121, Microsoft Research. http://research.microsoft.com/pubs/70335/tr-2006-121.pdf.

Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trends® in Machine Learning. http://www.cs.berkeley.edu/~jordan/papers/wainwright-jordan-fnt.pdf.

Wand, M. P. 2016. “Fast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing.” arXiv Preprint arXiv:1602.07412. http://arxiv.org/abs/1602.07412.

Welling, Max, Tom P Minka, and Yee Whye Teh. 2012. “Structured Region Graphs: Morphing EP into GBP,” July. http://arxiv.org/abs/1207.1426.

Winn, John M., and Christopher M. Bishop. 2005. “Variational Message Passing.” In Journal of Machine Learning Research, 661–94. http://johnwinn.org/Publications/papers/VMP2005.pdf.

Xing, Eric P., Michael I. Jordan, and Stuart Russell. 2003. “A Generalized Mean Field Algorithm for Variational Inference in Exponential Families.” In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 583–91. UAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. http://arxiv.org/abs/1212.2512.

Yedidia, J. S., W. T. Freeman, and Y. Weiss. 2003. “Understanding Belief Propagation and Its Generalizations.” In Exploring Artificial Intelligence in the New Millennium, edited by G. Lakemeyer and B. Nebel, 239–36. Morgan Kaufmann Publishers. http://www.merl.com/publications/TR2001-22.

Yuille, Alan. 2011. “Loopy Belief Propagation, Mean Field Theory and Bethe Approximations.” In Markov Random Fields for Vision and Image Processing, edited by Andrew Blake, Pushmeet Kohli, and Carsten Rother. Cambridge, Mass: MIT Press.