The Living Thing / Notebooks :

Probabilistic neural nets

MachineLearnese for Bayesian

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧 🚧 🚧

WARNING: this is not current at the moment

Creating neural networks which infer whole probability densities their predictions, (usually approximately) rather than point estimates. Or part, at least, of the density estimating problem, accomplished with neural nets, in a Bayesian setting. Prediction uncertainties, approximate model averaging etc would all fit in this category.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate PDGM. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that. Also, Gaussian Processes can be made to fit into this framing.

To learn:

Backgrounders

Radford Neal’s thesis (Neal96) is a foundational asymptotically-Bayesian use of neural netwroks. Yarin Gal’s PhD Thesis (Gal16) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout). Diederik P. Kingma’s thesis is the latest blockbuster in this tradition.

Alex Graves did a nice poster of his paper (Grav11) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a half-arsed implementation.

Reparameterisation

See reparametrization.

Practicalities

Blei Lab’s software tool: Edward (source) Tensorflow indeed comes with a contributed Bayesian library called BayesFlow (Which is not the same as the cytometry library of the same name) which by contrast has documentation so perfunctory that I can’t imagine it not being easier to reimplement it than to understand it.

Thomas Wiecki, Bayesian Deep Learning shows how to some variants with PyMC3.

Refs

Abbasnejad, Ehsan, Anthony Dick, and Anton van den Hengel. 2016. “Infinite Variational Autoencoder for Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 29. http://arxiv.org/abs/1611.07800.

Archer, Evan, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. 2015. “Black Box Variational Inference for State Space Models,” November. http://arxiv.org/abs/1511.07367.

Bishop, Christopher. 1994. “Mixture Density Networks.” Microsoft Research, January. https://www.microsoft.com/en-us/research/publication/mixture-density-networks/.

Bora, Ashish, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. 2017. “Compressed Sensing Using Generative Models.” In International Conference on Machine Learning, 537–46. http://arxiv.org/abs/1703.03208.

Bui, Thang D., Sujith Ravi, and Vivek Ramavajjala. 2017. “Neural Graph Machines: Learning Neural Networks Using Graphs,” March. http://arxiv.org/abs/1703.04818.

Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. “Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc. http://papers.nips.cc/paper/7892-neural-ordinary-differential-equations.pdf.

Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. “Random Feature Expansions for Deep Gaussian Processes.” In PMLR. http://proceedings.mlr.press/v70/cutajar17a.html.

Doerr, Andreas, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, and Sebastian Trimpe. 2018. “Probabilistic Recurrent State-Space Models,” January. http://arxiv.org/abs/1801.10395.

Dupont, Emilien, Arnaud Doucet, and Yee Whye Teh. 2019. “Augmented Neural ODEs,” April. http://arxiv.org/abs/1904.01681.

Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. “Identification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc. http://papers.nips.cc/paper/7115-identification-of-gaussian-process-state-space-models.pdf.

Fabius, Otto, and Joost R. van Amersfoort. 2014. “Variational Recurrent Auto-Encoders.” In Proceedings of ICLR. http://arxiv.org/abs/1412.6581.

Flunkert, Valentin, David Salinas, and Jan Gasthaus. 2017. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks,” April. http://arxiv.org/abs/1704.04110.

Gal, Yarin. 2015. “Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.

Gal, Yarin, and Zoubin Ghahramani. 2015a. “On Modern Deep Learning and Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.

———. 2016a. “A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In. http://arxiv.org/abs/1512.05287.

———. 2016b. “Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference.” In 4th International Conference on Learning Representations (ICLR) Workshop Track. http://arxiv.org/abs/1506.02158.

———. 2015b. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16). http://arxiv.org/abs/1506.02142.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. “Conditional Neural Processes,” July, 10. https://arxiv.org/abs/1807.01613v1.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. “Neural Processes,” July. https://arxiv.org/abs/1807.01622v1.

Gholami, Amir, Kurt Keutzer, and George Biros. 2019. “ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs,” February. http://arxiv.org/abs/1902.10298.

Graves, Alex. 2011. “Practical Variational Inference for Neural Networks.” In Proceedings of the 24th International Conference on Neural Information Processing Systems, 2348–56. NIPS’11. USA: Curran Associates Inc. https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf.

Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. “Speech Recognition with Deep Recurrent Neural Networks.” In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/ICASSP.2013.6638947.

Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. “DRAW: A Recurrent Neural Network for Image Generation,” February. http://arxiv.org/abs/1502.04623.

Gu, Shixiang, Zoubin Ghahramani, and Richard E Turner. 2015. “Neural Adaptive Sequential Monte Carlo.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2629–37. Curran Associates, Inc. http://papers.nips.cc/paper/5961-neural-adaptive-sequential-monte-carlo.pdf.

Gu, Shixiang, Sergey Levine, Ilya Sutskever, and Andriy Mnih. 2016. “MuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In Proceedings of ICLR. https://arxiv.org/abs/1511.05176v3.

Hoffman, Matthew, and David Blei. 2015. “Stochastic Structured Variational Inference.” In PMLR, 361–69. http://proceedings.mlr.press/v38/hoffman15.html.

Johnson, Matthew J., David Duvenaud, Alexander B. Wiltschko, Sandeep R. Datta, and Ryan P. Adams. 2016. “Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference,” March. http://arxiv.org/abs/1603.06277.

Karl, Maximilian, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. 2016. “Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.” In Proceedings of ICLR. http://arxiv.org/abs/1605.06432.

Kingma, Diederik P. 2017. “Variational Inference & Deep Learning: A New Synthesis.” https://www.dropbox.com/s/v6ua3d9yt44vgb3/cover_and_thesis.pdf?dl=0.

Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc. http://arxiv.org/abs/1606.04934.

Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” In ICLR 2014 Conference. http://arxiv.org/abs/1312.6114.

Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. “AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In UAI17. http://arxiv.org/abs/1610.05392.

Krishnan, Rahul G., Uri Shalit, and David Sontag. 2015. “Deep Kalman Filters.” arXiv Preprint arXiv:1511.05121. https://arxiv.org/abs/1511.05121.

Larsen, Anders Boesen Lindbo, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015. “Autoencoding Beyond Pixels Using a Learned Similarity Metric,” December. http://arxiv.org/abs/1512.09300.

Le, Tuan Anh, Maximilian Igl, Tom Jin, Tom Rainforth, and Frank Wood. 2017. “Auto-Encoding Sequential Monte Carlo.” arXiv Preprint arXiv:1705.10306. https://arxiv.org/abs/1705.10306.

Lobacheva, Ekaterina, Nadezhda Chirkova, and Dmitry Vetrov. 2017. “Bayesian Sparsification of Recurrent Neural Networks.” In Workshop on Learning to Generate Natural Language. http://arxiv.org/abs/1708.00077.

Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. “Causal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446–56. Curran Associates, Inc. http://papers.nips.cc/paper/7223-causal-effect-inference-with-deep-latent-variable-models.pdf.

Louizos, Christos, and Max Welling. 2016. “Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.” In arXiv Preprint arXiv:1603.04733, 1708–16. http://arxiv.org/abs/1603.04733.

———. 2017. “Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In PMLR, 2218–27. http://proceedings.mlr.press/v70/louizos17a.html.

MacKay, David J C. 2002. Information Theory, Inference & Learning Algorithms. Cambridge University Press.

Maddison, Chris J., Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. “Filtering Variational Objectives.” arXiv Preprint arXiv:1705.09279. https://arxiv.org/abs/1705.09279.

Matthews, Alexander G. de G., Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016. “GPflow: A Gaussian Process Library Using TensorFlow,” October. http://arxiv.org/abs/1610.08733.

Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. “Variational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML. http://arxiv.org/abs/1701.05369.

Neal, Radford M. 1996. Bayesian Learning for Neural Networks. Vol. 118. Secaucus, NJ, USA: Springer-Verlag New York, Inc. http://www.csri.utoronto.ca/~radford/ftp/thesis.pdf.

Ngiam, Jiquan, Zhenghao Chen, Pang W. Koh, and Andrew Y. Ng. 2011. “Learning Deep Energy Models.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 1105–12. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Ngiam_557.pdf.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press. http://www.gaussianprocess.org/gpml/.

Ravi, Sujith, and Qiming Diao. 2016. “Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation.” In PMLR, 519–28. http://proceedings.mlr.press/v51/ravi16.html.

Ryder, Thomas, Andrew Golightly, A. Stephen McGough, and Dennis Prangle. 2018. “Black-Box Variational Inference for Stochastic Differential Equations,” February. http://arxiv.org/abs/1802.03335.

Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. “Deep Probabilistic Programming.” In ICLR. http://arxiv.org/abs/1701.03757.

Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. “Edward: A Library for Probabilistic Modeling, Inference, and Criticism,” October. http://arxiv.org/abs/1610.09787.

Wainwright, M., and M. Jordan. 2005. “A Variational Principle for Graphical Models.” In New Directions in Statistical Signal Processing. Vol. 155. MIT Press. http://metro-natshar-31-71.brain.net.pk/articles/new-directions-in-statistical-signal-processing-from-systems-to-brains-neural-information-processing.9780262083485.28286.pdf#page=166.