A trick I see variational inference for probabilistic deep learning, best summarised as “fancy change of variables”. Looks like learning of manifolds, sorta.
Ingmar Shuster summary
The paper adopts the term normalizing flow for refering to the plain old change of variables formula for integrals. With the minor change of view that one can see this as a flow and the correct but slightly alien reference to a flow defined by the Langevin SDE or Fokker-Planck, both attributed only to ML/stats literature in the paper.
The theoretical contribution feels a little like a strawman: it simply states that, as Langevin and Hamiltonian dynamics can be seen as an infinitesimal normalizing flow, and both approximate the posterior when the step size goes to zero, normalizing flows can approximate the posterior arbitrarily well. This is of course nothing that was derived in the paper, nor is it news. Nor does it say anything about the practical approach suggested.
The invertible maps suggested have practical merit however, as they allow “splitting” of a mode into two, called the planar transformation (and plotted on the right of the image), as well as “attracting/repulsing” probability mass around a point. The Jacobian correction for both invertible maps being computable in time that is linear in the number of dimensions.
Rui Shu explains change of variables in probability and shows how it induces the normalizing flow idea. PyMC3 example of a non-trivial example. Adam Kosiorek summarises some fancy variants. Eric Jang did a tutorial.
Bamler, Robert, and Stephan Mandt. 2017. “Structured Black Box Variational Inference for Latent Time Series Models,” July. http://arxiv.org/abs/1707.01069.
Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018. “Sylvester Normalizing Flows for Variational Inference,” March. http://arxiv.org/abs/1803.05649.
Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. “Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc. http://papers.nips.cc/paper/7892-neural-ordinary-differential-equations.pdf.
Huang, Chin-Wei, David Krueger, Alexandre Lacoste, and Aaron Courville. 2018. “Neural Autoregressive Flows,” April. http://arxiv.org/abs/1804.00779.
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc. http://arxiv.org/abs/1606.04934.
Kingma, Diederik P., Tim Salimans, and Max Welling. 2015. “Variational Dropout and the Local Reparameterization Trick,” June. http://arxiv.org/abs/1506.02557.
Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” In ICLR 2014 Conference. http://arxiv.org/abs/1312.6114.
Kingma, Durk P, and Prafulla Dhariwal. 2018. “Glow: Generative Flow with Invertible 1x1 Convolutions.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 10236–45. Curran Associates, Inc. http://papers.nips.cc/paper/8224-glow-generative-flow-with-invertible-1x1-convolutions.pdf.
Louizos, Christos, and Max Welling. 2017. “Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In PMLR, 2218–27. http://proceedings.mlr.press/v70/louizos17a.html.
Papamakarios, George, Iain Murray, and Theo Pavlakou. 2017. “Masked Autoregressive Flow for Density Estimation.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2338–47. Curran Associates, Inc. http://papers.nips.cc/paper/6828-masked-autoregressive-flow-for-density-estimation.pdf.
Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2015. “Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” In Proceedings of ICML. http://arxiv.org/abs/1401.4082.
Rezende, Danilo, and Shakir Mohamed. 2015. “Variational Inference with Normalizing Flows.” In International Conference on Machine Learning, 1530–8. ICML’15. Lille, France: JMLR.org. http://arxiv.org/abs/1505.05770.
Ruiz, Francisco J. R., Michalis K. Titsias, and David M. Blei. 2016. “The Generalized Reparameterization Gradient.” In Advances in Neural Information Processing Systems. http://arxiv.org/abs/1610.02287.