# Probabilistic programming

### a.k.a. Bayesian programming

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧 🚧 🚧

This is apparently what we call Bayesian inference these days. When we say Bayesian programming, we might mean a simple hierarchical model, but we want to emphasise hope that we might even succeed in doing inference for very complicated models indeed, possibly ones without tractable likelihoods of any kind, maybe even Turing-complete. Hope in this context means something like “we provide the programming primitives to in principle express the awful crazy likelihood structure of your complicated problem, although you are on your own in demonstrating any kind of concentration or convergence for your estimates of its posterior likelihood in the light of data.”

Mostly these tools are based on Markov Chain Monte Carlo sampling which turns out to be a startlingly general way to grind out the necessary calculations. There are other ways, such as classic conjugate priors, variational methods or reparameterisation flows, and many hybrids thereof.

See George Ho for an in-depth introduction into what might be desirable to solve these problems in practice.

A probabilistic programming framework needs to provide six things:

1. A language or API for users to specify a model
2. A library of probability distributions and transformations to build the posterior density
3. At least one inference algorithm, which either draws samples from the posterior (in the case of Markov Chain Monte Carlo, MCMC) or computes some approximation of it (in the case of variational inference, VI)
4. At least one optimizer, which can compute the mode of the posterior density
5. An autodifferentiation library to compute gradients required by the inference algorithm and optimizer
6. A suite of diagnostics to monitor and analyze the quality of inference

## Stan

Stan is the inference toolbox for broad classes of Bayesian model and the de facto reference point.

Andrew Gelman notes

The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in the arXiv paper (by Bob Carpenter, Matt Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt). These are sort of background for what we’re trying to do.

If you haven’t read Maria Gorinova’s MS thesis and POPL paper (with Andrew Gordon and Charles Sutton), you should probably start there.

Radford Neal’s intro to HMC is nice, as is the one in David McKay’s book. Michael Betancourt’s papers are the thing to read to understand HMC deeply—he just wrote another brain bender on geometric autodiff (all on arXiv). Starting with the one on hierarchical models would be good as it explains the necessity of reparameterizations.

Also I recommend our JEBS paper (with Daniel Lee, and Jiqiang Guo) as it presents Stan from a user’s rather than a developer’s perspective.

## Edward

From Blei’s lab, leverages trendy deep learning machinery, tensorflow for variational Bayes.

## Pyro

pytorch + bayes = pyro. For rationale, see the pyro launch announcment:

We believe the critical ideas to solve AI will come from a joint effort among a worldwide community of people pursuing diverse approaches. By open sourcing Pyro, we hope to encourage the scientific world to collaborate on making AI tools more flexible, open, and easy-to-use. We expect the current (alpha!) version of Pyro will be of most interest to probabilistic modelers who want to leverage large data sets and deep networks, PyTorch users who want easy-to-use Bayesian computation, and data scientists ready to explore the ragged edge of new technology.

## Turing.jl

Turing.jl

Turing.jl is a Julia library for (universal) probabilistic programming. Current features include:

• Universal probabilistic programming with an intuitive modelling interface
• Hamiltonian Monte Carlo (HMC) sampling for differentiable posterior distributions
• Particle MCMC sampling for complex posterior distributions involving discrete variables and stochastic control flows
• Gibbs sampling that combines particle MCMC and HMC

It is in fact one of many julia options.

## PyMC3/PyMC4

Pymc3 is pure python, which means you don’t need C++ to fix things like you do in stan. It’s presumably generally slower than stan if you actually do real MC simulations, but I haven’t checked.

## Greta

Greta

greta models are written right in R, so there’s no need to learn another language like BUGS or Stan

greta uses Google TensorFlow so it’s fast even on massive datasets, and runs on CPU clusters and GPUs

## Soss.jl

Soss.jl

Soss is a library for probabilistic programming.

Let’s jump right in with a simple linear model:

using Soss

m = @model X begin
β ~ Normal() |> iid(size(X,2))
y ~ For(eachrow(X)) do x
Normal(x’ * β, 1)
end
end;

In Soss, models are first-class and function-like, and “applying” a model to its arguments gives a joint distribution.

Just a few of the things we can do in Soss:

• Sample from the (forward) model
• Condition a joint distribution on a subset of parameters
• Have arbitrary Julia values (yes, even other models) as inputs or outputs of a model
• Build a new model for the predictive distribution, for assigning parameters to particular values

## Zhusuan

ZhuSuan is a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan provides deep learning style primitives and algorithms for building probabilistic models and applying Bayesian inference. The supported inference algorithms include:

• Variational inference with programmable variational posteriors, various objectives and advanced gradient estimators (SGVB, REINFORCE, VIMCO, etc.).
• Importance sampling for learning and evaluating models, with programmable proposals.
• Hamiltonian Monte Carlo (HMC) with parallel chains, and optional automatic parameter tuning.

## Church/Anglican

Level up you esoterism with Church, a general-purpose Turing-complete Monte Carlo lisp-derivative, which is unbearably slow but does some reputedly cute tricks with modeling human problem-solving, and other likelihood-free methods, according to creators Noah Goodman and Joshua Tenenbaum.

See also anglican, which is the same but different, being built in clojure, and hence also leveraging browser Clojurescript.

## WebPPL

WebPPL is a successor to Church designed as a teaching language for probabilistic reasoning in the browser. If you like Javascript ML.

# Refs

Carroll, Colin. n.d. “A Tour of Probabilistic Programming Language APIs.” Https://Colcarroll.github.io. https://colcarroll.github.io/ppl-api/.

Gelman, Andrew, Daniel Lee, and Jiqiang Guo. 2015. “Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization.” Journal of Educational and Behavioral Statistics 40 (5): 530–43. https://doi.org/10.3102/1076998615606113.

Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). https://doi.org/10.18637/jss.v076.i01.

Gorinova, Maria I., Andrew D. Gordon, and Charles Sutton. 2019. “Probabilistic Programming with Densities in SlicStan: Efficient, Flexible and Deterministic.” Proceedings of the ACM on Programming Languages 3 (POPL): 1–30. https://doi.org/10.1145/3290348.

Kochurov, Max, Colin Carroll, Thomas Wiecki, and Junpeng Lao. 2019. “PyMC4: Exploiting Coroutines for Implementing a Probabilistic Programming Framework,” September. https://openreview.net/forum?id=rkgzj5Za8H.

Lao, Junpeng. 2019. “A Hitchhiker’s Guide to Designing a Bayesian Library in Python.” Presentation Slides presented at the PyData Córdoba, Córdoba, Argentina, September 29. https://docs.google.com/presentation/d/1xgNRJDwkWjTHOYMj5aGefwWiV8x-Tz55GfkBksZsN3g/edit?usp=sharing.

Moore, Dave, and Maria I. Gorinova. 2018. “Effect Handling for Composable Program Transformations in Edward2,” November. http://arxiv.org/abs/1811.06150.

Pradhan, Neeraj, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Eli Bingham, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D. Goodman. 2018. “Pyro: Deep Universal Probabilistic Programming,” October. http://arxiv.org/abs/1810.09538.

PyMC Development Team. 2019. “PyMC3 Developer Guide.” https://docs.pymc.io/developer_guide.html.

Rainforth, Tom. 2017. “Automating Inference, Learning, and Design Using Probabilistic Programming.” PhD Thesis, University of Oxford. http://www.robots.ox.ac.uk/~twgr/assets/pdf/rainforth2017thesis.pdf.

Salvatier, John, Thomas V. Wiecki, and Christopher Fonnesbeck. 2016. “Probabilistic Programming in Python Using PyMC3.” PeerJ Computer Science 2 (April): e55. https://doi.org/10.7717/peerj-cs.55.

Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. “Deep Probabilistic Programming.” In ICLR. http://arxiv.org/abs/1701.03757.

Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. “Edward: A Library for Probabilistic Modeling, Inference, and Criticism,” October. http://arxiv.org/abs/1610.09787.

Vasudevan, Srinivas, Ian Langmore, Dustin Tran, Eugene Brevdo, Joshua V. Dillon, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, and Rif A. Saurous. 2017. “TensorFlow Distributions,” November. http://arxiv.org/abs/1711.10604.