I see these popping up in graphical model inference, in time-series, and in variational approximation and compressed sensing. What are they?

The grandparent idea seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc. Although I gather from passing reference that many popoular algorithms also happen to be message-passing-type ones.

Now there are many flavours, such as “Gaussian” and “approximate”. Apparently this definition subsumes such diverse models as the Viterbi and Baum-Welch algorithms, among others. See WaJo08 for an overview.

Anyway, what the hell are these things?

Advice from Mink05:

The recipe to make a message-passing algorithm has four steps:

Pick an approximating family for q to be chosen from. For example, the set of fully-factorized distributions, the set of Gaussians, the set of k-component mixtures, etc.

Pick a divergence measure to minimize. For example, mean-field methods minimize the Kullback-Leibler divergence \(KL(q \| p)\), expectation propagation minimizes \(KL(p \| q)\), and power EP minimizes α-divergence, \(D\alpha(p \| q)\).

Construct an optimization algorithm for the chosen divergence measure and approximating family. Usually this is a fixed-point iteration obtained by setting the gradients to zero.

Distribute the optimization across the network, by dividing the network p into factors, and minimizing local divergence at each factor.

There is an overview lecture by Thomas Orton, which connects this with statistical mechanics of statistics.

Last week, we saw how certain computational problems like 3SAT exhibit a thresholding behavior, similar to a phase transition in a physical system. In this post, we’ll continue to look at this phenomenon by exploring a heuristic method, belief propagation (and the cavity method), which has been used to make hardness conjectures, and also has thresholding properties. In particular, we’ll start by looking at belief propagation for approximate inference on sparse graphs as a purely computational problem. After doing this, we’ll switch perspectives and see belief propagation motivated in terms of Gibbs free energy minimization for physical systems. With these two perspectives in mind, we’ll then try to use belief propagation to do inference on the the stochastic block model. We’ll see some heuristic techniques for determining when BP succeeds and fails in inference, as well as some numerical simulation results of belief propagation for this problem. Lastly, we’ll talk about where this all fits into what is currently known about efficient algorithms and information theoretic barriers for the stochastic block model.

GAMP:

Generalized Approximate Message Passing (GAMP) is an approximate, but computationally efficient method for estimation problems with linear mixing. In the linear mixing problem an unknown vector, \(\mathbf{x}\), with independent components, is first passed through linear transform \(\mathbf{z}=\mathbf{A}\mathbf{x}\) and then observed through a general probabilistic, componentwise measurement channel to yield a measurement vector \(\mathbf{y}\). The problem is to estimate \(\mathbf{x}\) and \(\mathbf{z}\) from \(\mathbf{y}\) and \(\mathbf{A}\). This problem arises in a range of applications including compressed sensing.

Optimal solutions to linear mixing estimation problems are, in general, computationally intractable as the complexity of most brute force algorithms grows exponentially in the dimension of the vector \(\mathbf{x}\). GAMP approximately performs the estimation through a Gaussian approximation of loopy belief propagation that reduces the vector-valued estimation problem to a sequence of scalar estimation problems on the components of the vectors \(\mathbf{x}\) and \(\mathbf{z}\). The GAMP methodology may also have applications to problems with structured sparsity and low-rank matrix factorization

## Refs

- XiJR03: (2003) A Generalized Mean Field Algorithm for Variational Inference in Exponential Families. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 583–591). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- MSJJ15: (2015) Adding vs Averaging in Distributed Primal-Dual Optimization.
*ArXiv:1502.03508 [Cs]*. - JSTT14: (2014) Communication-Efficient Distributed Dual Coordinate Ascent. In Advances in Neural Information Processing Systems 27 (pp. 3068–3076). Curran Associates, Inc.
- ScRa12: (2012) Compressive phase retrieval via generalized approximate message passing. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 815–822). DOI
- SmEi08: (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
- Mink05: (2005) Divergence measures and message passing. Technical report, Microsoft Research
- Mink08: (2008) EP: A quick reference.
*Techincal Report*. - Deha16: (2016) Expectation Propagation performs a smoothed gradient descent.
*ArXiv:1612.05053 [Stat]*. - Wand16: (2016) Fast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing.
*ArXiv Preprint ArXiv:1602.07412*. - Pear86: (1986) Fusion, propagation, and structuring in belief networks.
*Artificial Intelligence*, 29(3), 241–288. DOI - WaJo08: (2008)
*Graphical models, exponential families, and variational inference*(Vol. 1). - SuMi06: (2006) Local training and belief propagation. Technical Report TR-2006-121, Microsoft Research
- Yuil11: (2011) Loopy Belief Propagation, Mean field theory and Bethe approximations. In Markov random fields for vision and image processing. Cambridge, Mass: MIT Press
- BlKR11: (2011)
*Markov Random Fields for Vision and Image Processing*. Cambridge, Mass: MIT Press - DoMM10: (2010) Message passing algorithms for compressed sensing: I motivation and construction. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI
- DoMM09a: (2009a) Message passing algorithms for compressed sensing: II analysis and validation. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI
- DoMM09b: (2009b) Message-passing algorithms for compressed sensing.
*Proceedings of the National Academy of Sciences*, 106(45), 18914–18919. DOI - BoSc16: (2016) Onsager-Corrected Deep Networks for Sparse Linear Inverse Problems.
*ArXiv:1612.01183 [Cs, Math]*. - CDHB09: (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
- WeMT12: (2012) Structured Region Graphs: Morphing EP into GBP.
*ArXiv:1207.1426 [Cs]*. - BaMo11: (2011) The dynamics of message passing on dense graphs, with applications to compressed sensing.
*IEEE Transactions on Information Theory*, 57(2), 764–785. DOI - YeFW03: (2003) Understanding Belief Propagation and Its Generalizations. In Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
- WiBi05: (2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
- MaJW06: (2006) Walk-Sums and Belief Propagation in Gaussian Graphical Models.
*Journal of Machine Learning Research*, 7, 2031–2064.