The Living Thing / Notebooks :

Message passing algorithms

I see these popping up in graphical model inference, in time-series, and in variational approximation and compressed sensing. What are they?

The grandparent idea seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc. Although I gather from passing reference that many popoular algorithms also happen to be message-passing-type ones.

Now there are many flavours, such as “Gaussian” and “approximate”. Apparently this definition subsumes such diverse models as the Viterbi and Baum-Welch algorithms, among others. See WaJo08 for an overview.

Anyway, what the hell are these things?

Advice from Mink05:

The recipe to make a message-passing algorithm has four steps:

  1. Pick an approximating family for q to be chosen from. For example, the set of fully-factorized distributions, the set of Gaussians, the set of k-component mixtures, etc.
  2. Pick a divergence measure to minimize. For example, mean-field methods minimize the Kullback-Leibler divergence \(KL(q \| p)\), expectation propagation minimizes \(KL(p \| q)\), and power EP minimizes α-divergence, \(D\alpha(p \| q)\).
  3. Construct an optimization algorithm for the chosen divergence measure and approximating family. Usually this is a fixed-point iteration obtained by setting the gradients to zero.
  4. Distribute the optimization across the network, by dividing the network p into factors, and minimizing local divergence at each factor.

Grandiose Claims

To read

BaMo11
Bayati, M., & Montanari, A. (2011) The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Transactions on Information Theory, 57(2), 764–785. DOI.
BlKR11
Blake, A., Kohli, P., & Rother, C. (Eds.). (2011) Markov Random Fields for Vision and Image Processing. . Cambridge, Mass: MIT Press
CDHB09
Cevher, V., Duarte, M. F., Hegde, C., & Baraniuk, R. (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
DoMM09a
Donoho, D. L., Maleki, A., & Montanari, A. (2009a) Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45), 18914–18919. DOI.
DoMM09b
Donoho, D. L., Maleki, A., & Montanari, A. (2009b) Message passing algorithms for compressed sensing: II analysis and validation. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI.
DoMM10
Donoho, D. L., Maleki, A., & Montanari, A. (2010) Message passing algorithms for compressed sensing: I motivation and construction. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI.
JSTT14
Jaggi, M., Smith, V., Takac, M., Terhorst, J., Krishnan, S., Hofmann, T., & Jordan, M. I.(2014) Communication-Efficient Distributed Dual Coordinate Ascent. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 3068–3076). Curran Associates, Inc.
MSJJ15
Ma, C., Smith, V., Jaggi, M., Jordan, M. I., Richtárik, P., & Takáč, M. (2015) Adding vs Averaging in Distributed Primal-Dual Optimization. arXiv:1502.03508 [cs].
MaJW06
Malioutov, D. M., Johnson, J. K., & Willsky, A. S.(2006) Walk-Sums and Belief Propagation in Gaussian Graphical Models. Journal of Machine Learning Research, 7, 2031–2064.
Mink05
Minka, T. P.(2005) Divergence measures and message passing. . Technical report, Microsoft Research
Mink08
Minka, T. P.(2008) EP: A quick reference. Techincal Report.
Pear86
Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
ScRa12
Schniter, P., & Rangan, S. (2012) Compressive phase retrieval via generalized approximate message passing. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 815–822). DOI.
SmEi08
Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
SuMi06
Sutton, C., & Minka, T. P.(2006) Local training and belief propagation. . Technical Report TR-2006-121, Microsoft Research
WaJo08
Wainwright, M. J., & Jordan, M. I.(2008) Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1-2), 1–305. DOI.
Wand16
Wand, M. P.(2016) Fast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing. arXiv Preprint arXiv:1602.07412.
WeMT12
Welling, M., Minka, T. P., & Teh, Y. W.(2012) Structured Region Graphs: Morphing EP into GBP. arXiv:1207.1426 [cs].
WiBi05
Winn, J. M., & Bishop, C. M.(2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
XiJR03
Xing, E. P., Jordan, M. I., & Russell, S. (2003) A Generalized Mean Field Algorithm for Variational Inference in Exponential Families. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 583–591). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
YeFW03
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
Yuil11
Yuille, A. (2011) Loopy Belief Propagation, Mean field theory and Bethe approximations. In A. Blake, P. Kohli, & C. Rother (Eds.), Markov random fields for vision and image processing. Cambridge, Mass: MIT Press