# Variational inference

TBD.

Expectation maximisation, Bayes, graphical models, mumble mumble.

Using optimisation to approximate posterior semi parametrically rather than purely sampling from it. This is nice because as a message passing methods, it scales up to large data.

I suspect this is not intrinsically Bayesian, but most of the literature on it is from Bayesians, so I won’t look into it in a frequentist context for now.

Kingma and Welling

## loss functions

It’s called Operator VI as a fancy way to say that one is flexible in constructing how exactly the objective function uses $\pi, q$ and test functions from some family $\mathcal{F}$. I completely agree with the motivation: KL-Divergence in the form $\int q(x) \log \frac{q(x)}{\pi(x)} \mathrm{d}x$ indeed underestimates the variance of $\pi$ and approximates only one mode. Using KL the other way around, $\int \pi(x) \log \frac{pi(x)}{q(x)} \mathrm{d}x$ takes all modes into account, but still tends to underestimate variance.

[…]the authors suggest an objective using what they call the Langevin-Stein Operator which does not make use of the proposal density $q$ at all but uses test functions exclusively. The only requirement is that we be able to draw samples from the proposal. The authors claim that assuming access to $q$ limits applicability of an objective/operator. This claim is not substantiated however.