# Message passing algorithms

I see these popping up in graphical model inference, in time-series, and in variational approximation and compressed sensing. What are they?

The grandparent idea seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc. Although I gather from passing reference that many popoular algorithms also happen to be message-passing-type ones.

Now there are many flavours, such as “Gaussian” and “approximate”. Apparently this definition subsumes such diverse models as the Viterbi and Baum-Welch algorithms, among others. See WaJo08 for an overview.

Anyway, what the hell are these things?

The recipe to make a message-passing algorithm has four steps:

• Pick an approximating family for q to be chosen from. For example, the set of fully-factorized distributions, the set of Gaussians, the set of k-component mixtures, etc.

• Pick a divergence measure to minimize. For example, mean-field methods minimize the Kullback-Leibler divergence $$KL(q \| p)$$, expectation propagation minimizes $$KL(p \| q)$$, and power EP minimizes α-divergence, $$D\alpha(p \| q)$$.

• Construct an optimization algorithm for the chosen divergence measure and approximating family. Usually this is a fixed-point iteration obtained by setting the gradients to zero.

• Distribute the optimization across the network, by dividing the network p into factors, and minimizing local divergence at each factor.

There is an overview lecture by Thomas Orton, which connects this with statistical mechanics of statistics.

Last week, we saw how certain computational problems like 3SAT exhibit a thresholding behavior, similar to a phase transition in a physical system. In this post, we’ll continue to look at this phenomenon by exploring a heuristic method, belief propagation (and the cavity method), which has been used to make hardness conjectures, and also has thresholding properties. In particular, we’ll start by looking at belief propagation for approximate inference on sparse graphs as a purely computational problem. After doing this, we’ll switch perspectives and see belief propagation motivated in terms of Gibbs free energy minimization for physical systems. With these two perspectives in mind, we’ll then try to use belief propagation to do inference on the the stochastic block model. We’ll see some heuristic techniques for determining when BP succeeds and fails in inference, as well as some numerical simulation results of belief propagation for this problem. Lastly, we’ll talk about where this all fits into what is currently known about efficient algorithms and information theoretic barriers for the stochastic block model.

GAMP:

Generalized Approximate Message Passing (GAMP) is an approximate, but computationally efficient method for estimation problems with linear mixing. In the linear mixing problem an unknown vector, $$\mathbf{x}$$, with independent components, is first passed through linear transform $$\mathbf{z}=\mathbf{A}\mathbf{x}$$ and then observed through a general probabilistic, componentwise measurement channel to yield a measurement vector $$\mathbf{y}$$. The problem is to estimate $$\mathbf{x}$$ and $$\mathbf{z}$$ from $$\mathbf{y}$$ and $$\mathbf{A}$$. This problem arises in a range of applications including compressed sensing.

Optimal solutions to linear mixing estimation problems are, in general, computationally intractable as the complexity of most brute force algorithms grows exponentially in the dimension of the vector $$\mathbf{x}$$. GAMP approximately performs the estimation through a Gaussian approximation of loopy belief propagation that reduces the vector-valued estimation problem to a sequence of scalar estimation problems on the components of the vectors $$\mathbf{x}$$ and $$\mathbf{z}$$. The GAMP methodology may also have applications to problems with structured sparsity and low-rank matrix factorization