The Living Thing / Notebooks : Message passing algorithms

I see these popping up in graphical model inference, in time-series, and in variational approximation and compressed sensing. What are they?

The grandparent idea seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc. Although I gather from passing reference that many popoular algorithms also happen to be message-passing-type ones.

Now there are many flavours, such as “Gaussian” and “approximate”. Apparently this definition subsumes such diverse models as the Viterbi and Baum-Welch algorithms, among others. See WaJo08 for an overview.

Anyway, what the hell are these things?

Advice from Mink05:

The recipe to make a message-passing algorithm has four steps:

  1. Pick an approximating family for q to be chosen from. For example, the set of fully-factorized distributions, the set of Gaussians, the set of k-component mixtures, etc.
  2. Pick a divergence measure to minimize. For example, mean-field methods minimize the Kullback-Leibler divergence \(KL(q \| p)\), expectation propagation minimizes \(KL(p \| q)\), and power EP minimizes α-divergence, \(D\alpha(p \| q)\).
  3. Construct an optimization algorithm for the chosen divergence measure and approximating family. Usually this is a fixed-point iteration obtained by setting the gradients to zero.
  4. Distribute the optimization across the network, by dividing the network p into factors, and minimizing local divergence at each factor.

Grandiose Claims

To read

Bayati, M., & Montanari, A. (2011) The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Transactions on Information Theory, 57(2), 764–785. DOI.
Blake, A., Kohli, P., & Rother, C. (Eds.). (2011) Markov Random Fields for Vision and Image Processing. . Cambridge, Mass: MIT Press
Cevher, V., Duarte, M. F., Hegde, C., & Baraniuk, R. (2009) Sparse Signal Recovery Using Markov Random Fields. In Advances in Neural Information Processing Systems (pp. 257–264). Curran Associates, Inc.
Donoho, D. L., Maleki, A., & Montanari, A. (2009a) Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45), 18914–18919. DOI.
Donoho, D. L., Maleki, A., & Montanari, A. (2009b) Message passing algorithms for compressed sensing: II analysis and validation. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI.
Donoho, D. L., Maleki, A., & Montanari, A. (2010) Message passing algorithms for compressed sensing: I motivation and construction. In 2010 IEEE Information Theory Workshop (ITW) (pp. 1–5). DOI.
Jaggi, M., Smith, V., Takac, M., Terhorst, J., Krishnan, S., Hofmann, T., & Jordan, M. I.(2014) Communication-Efficient Distributed Dual Coordinate Ascent. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 3068–3076). Curran Associates, Inc.
Ma, C., Smith, V., Jaggi, M., Jordan, M. I., Richtárik, P., & Takáč, M. (2015) Adding vs Averaging in Distributed Primal-Dual Optimization. arXiv:1502.03508 [cs].
Malioutov, D. M., Johnson, J. K., & Willsky, A. S.(2006) Walk-Sums and Belief Propagation in Gaussian Graphical Models. Journal of Machine Learning Research, 7, 2031–2064.
Minka, T. P.(2005) Divergence measures and message passing. . Technical report, Microsoft Research
Minka, T. P.(2008) EP: A quick reference. Techincal Report.
Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3), 241–288. DOI.
Schniter, P., & Rangan, S. (2012) Compressive phase retrieval via generalized approximate message passing. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 815–822). DOI.
Smith, D. A., & Eisner, J. (2008) Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 145–156). Association for Computational Linguistics
Sutton, C., & Minka, T. P.(2006) Local training and belief propagation. . Technical Report TR-2006-121, Microsoft Research
Wainwright, M. J., & Jordan, M. I.(2008) Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1-2), 1–305. DOI.
Wand, M. P.(2016) Fast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing. arXiv Preprint arXiv:1602.07412.
Welling, M., Minka, T. P., & Teh, Y. W.(2012) Structured Region Graphs: Morphing EP into GBP. arXiv:1207.1426 [cs].
Winn, J. M., & Bishop, C. M.(2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
Xing, E. P., Jordan, M. I., & Russell, S. (2003) A Generalized Mean Field Algorithm for Variational Inference in Exponential Families. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 583–591). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2003) Understanding Belief Propagation and Its Generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring Artificial Intelligence in the New Millennium (pp. 239–236). Morgan Kaufmann Publishers
Yuille, A. (2011) Loopy Belief Propagation, Mean field theory and Bethe approximations. In A. Blake, P. Kohli, & C. Rother (Eds.), Markov random fields for vision and image processing. Cambridge, Mass: MIT Press