The Living Thing / Notebooks :

Inference without KL divergence

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧 🚧 🚧

Placeholder. Various links on inference by minimising some other divergence than the Kullback Leibler divergence.

(Chu, Blanchet, and Glynn 2019):

in many fields, the object of interest is a probability distribution; moreover, the learning process is guided by a probability functional to be minimized, a loss function that conceptually maps a probability distribution to a real number[…] Because the optimization now takes place in the infinite- dimensional space of probability measures, standard finite-dimensional algorithms like gradient descent are initially unavailable; even the proper notion for the derivative of these functionals is unclear. We call upon on a body of literature known as von Mises calculus, originally developed in the field of asymptotic statistics, to make these functional derivatives precise. Remarkably, we find that once the connection is made, the resulting generalized descent algorithm, which we call probability functional descent, is intimately compatible with standard deep learning techniques such as stochastic gradient descent, the reparameterization trick, and adversarial training.

Refs

Ambrogioni, Luca, Umut Güçlü, Yagmur Güçlütürk, Max Hinne, Eric Maris, and Marcel A. J. van Gerven. 2018. “Wasserstein Variational Inference.” In Proceedings of the 32Nd International Conference on Neural Information Processing Systems, 2478–87. NIPS’18. USA: Curran Associates Inc. http://arxiv.org/abs/1805.11284.

Arjovsky, Martin, Soumith Chintala, and Léon Bottou. 2017. “Wasserstein Generative Adversarial Networks.” In International Conference on Machine Learning, 214–23. http://proceedings.mlr.press/v70/arjovsky17a.html.

Beran, Rudolf. 1977. “Minimum Hellinger Distance Estimates for Parametric Models.” The Annals of Statistics 5 (3): 445–63. https://doi.org/10.1214/aos/1176343842.

Bissiri, P. G., C. C. Holmes, and S. G. Walker. 2016. “A General Framework for Updating Belief Distributions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (5): 1103–30. https://doi.org/10.1111/rssb.12158.

Blanchet, Jose, Yang Kang, and Karthyek Murthy. 2016. “Robust Wasserstein Profile Inference and Applications to Machine Learning,” October. http://arxiv.org/abs/1610.05627.

Blanchet, Jose, Yang Kang, Fan Zhang, and Karthyek Murthy. 2017. “Data-Driven Optimal Cost Selection for Distributionally Robust Optimization,” May. http://arxiv.org/abs/1705.07152.

Blanchet, Jose, Karthyek Murthy, and Fan Zhang. 2018. “Optimal Transport Based Distributionally Robust Optimization: Structural Properties and Iterative Schemes,” October. http://arxiv.org/abs/1810.02403.

Campbell, Trevor, and Tamara Broderick. 2017. “Automated Scalable Bayesian Inference via Hilbert Coresets,” October. http://arxiv.org/abs/1710.05053.

Chen, Xinshi, Hanjun Dai, and Le Song. 2019. “Meta Particle Flow for Sequential Bayesian Inference,” February. http://arxiv.org/abs/1902.00640.

Chu, Casey, Jose Blanchet, and Peter Glynn. 2019. “Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning,” January. http://arxiv.org/abs/1901.10691.

Fernholz, Luisa Turrin. 1983. Von Mises Calculus for Statistical Functionals. Lecture Notes in Statistics 19. New York: Springer.

———. 2014. “Statistical Functionals.” In Wiley StatsRef: Statistics Reference Online. American Cancer Society. https://doi.org/10.1002/9781118445112.stat01843.

Frogner, Charlie, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio. 2015. “Learning with a Wasserstein Loss.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2053–61. Curran Associates, Inc. http://papers.nips.cc/paper/5679-learning-with-a-wasserstein-loss.pdf.

Gao, Rui, and Anton J. Kleywegt. 2016. “Distributionally Robust Stochastic Optimization with Wasserstein Distance,” April. http://arxiv.org/abs/1604.02199.

Gibbs, Alison L., and Francis Edward Su. 2002. “On Choosing and Bounding Probability Metrics.” International Statistical Review 70 (3): 419–35. https://doi.org/10.1111/j.1751-5823.2002.tb00178.x.

Gulrajani, Ishaan, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. “Improved Training of Wasserstein GANs,” March. http://arxiv.org/abs/1704.00028.

Guo, Xin, Johnny Hong, Tianyi Lin, and Nan Yang. 2017. “Relaxed Wasserstein with Applications to GANs,” May. http://arxiv.org/abs/1705.07164.

Liu, Huidong, Xianfeng Gu, and Dimitris Samaras. 2018. “A Two-Step Computation of the Exact GAN Wasserstein Distance.” In International Conference on Machine Learning, 3159–68. http://proceedings.mlr.press/v80/liu18d.html.

Mahdian, Saied, Jose Blanchet, and Peter Glynn. 2019. “Optimal Transport Relaxations with Application to Wasserstein GANs,” June. https://arxiv.org/abs/1906.03317v1.

Panaretos, Victor M., and Yoav Zemel. 2019. “Statistical Aspects of Wasserstein Distances.” Annual Review of Statistics and Its Application 6 (1): 405–31. https://doi.org/10.1146/annurev-statistics-030718-104938.

Ranganath, Rajesh, Dustin Tran, Jaan Altosaar, and David Blei. 2016. “Operator Variational Inference.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 496–504. Curran Associates, Inc. http://papers.nips.cc/paper/6091-operator-variational-inference.pdf.

Rustamov, Raif M. 2019. “Closed-Form Expressions for Maximum Mean Discrepancy with Applications to Wasserstein Auto-Encoders,” January. http://arxiv.org/abs/1901.03227.

Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Edited by Filippo Santambrogio. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2_1.

Solomon, Justin, Fernando de Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. 2015. “Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains.” ACM Transactions on Graphics 34 (4): 66:1–66:11. https://doi.org/10.1145/2766963.

Wang, Prince Zizhuang, and William Yang Wang. 2019. “Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 284–94. Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1025.