A trick I see mostly in using variational inference for probabilistic deep learning, but which is apparently more general.
Ingmar Shuster summary
The paper adopts the term normalizing flow for refering to the plain old change of variables formula for integrals. With the minor change of view that one can see this as a flow and the correct but slightly alien reference to a flow defined by the Langevin SDE or Fokker-Planck, both attributed only to ML/stats literature in the paper. The theoretical contribution feels a little like a strawman: it simply states that, as Langevin and Hamiltonian dynamics can be seen as an infinitesimal normalizing flow, and both approximate the posterior when the step size goes to zero, normalizing flows can approximate the posterior arbitrarily well. This is of course nothing that was derived in the paper, nor is it news. Nor does it say anything about the practical approach suggested. The invertible maps suggested have practical merit however, as they allow “splitting” of a mode into two, called the planar transformation (and plotted on the right of the image), as well as “attracting/repulsing” probability mass around a point. The Jacobian correction for both invertible maps being computable in time that is linear in the number of dimensions.
How does this relate to Langevin Monte Carlo? How does it relate to nonparameteric copula models?
- KiWe14: (2014) Auto-Encoding Variational Bayes. In ICLR 2014 conference.
- KSJC16: (2016) Improving Variational Inference with Inverse Autoregressive Flow. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
- PaMP17: (2017) Masked Autoregressive Flow for Density Estimation. In Advances in Neural Information Processing Systems 30 (pp. 2338–2347). Curran Associates, Inc.
- LoWe17: (2017) Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. In PMLR (pp. 2218–2227).
- HKLC18: (2018) Neural Autoregressive Flows. ArXiv:1804.00779 [Cs, Stat].
- ReMW15: (2015) Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In Proceedings of ICML.
- BaMa17: (2017) Structured Black Box Variational Inference for Latent Time Series Models. ArXiv:1707.01069 [Cs, Stat].
- BHTW18: (2018) Sylvester Normalizing Flows for Variational Inference. ArXiv:1803.05649 [Cs, Stat].
- RuTB16: (2016) The Generalized Reparameterization Gradient. In Advances In Neural Information Processing Systems.
- KiSW15: (2015) Variational Dropout and the Local Reparameterization Trick. ArXiv:1506.02557 [Cs, Stat].
- ReMo15: (2015) Variational Inference with Normalizing Flows. ArXiv:1505.05770 [Cs, Stat].