*TBD.*

Expectation maximisation, Bayes, graphical models, mumble mumble.

Using optimisation to approximate posterior semi parametrically rather than purely sampling from it. This is nice because as a message passing methods, it scales up to large data.

I suspect this is not intrinsically Bayesian, but most of the literature on it is from Bayesians, so I won’t look into it in a frequentist context for now.

See also mixture models, probabilistic deep learning, directed graphical models, and note that lots of the software to do this is filed under Bayesian Statistics HOWTO.

## Why does this always seem to be about mixture models

idk. Easy to keep them normalised?

However, see the extension into reparameterisation.

## Loss functions

For now see probability metrics.

### Loss function aside

Ingmar Schuster’s critique of black box loss as seen in RTAB16:

It’s called Operator VI as a fancy way to say that one is flexible in constructing how exactly the objective function uses \(\pi, q\) and test functions from some family \(\mathcal{F}\). I completely agree with the motivation: KL-Divergence in the form \(\int q(x) \log \frac{q(x)}{\pi(x)} \mathrm{d}x\) indeed underestimates the variance of \(\pi\) and approximates only one mode. Using KL the other way around, \(\int \pi(x) \log \frac{pi(x)}{q(x)} \mathrm{d}x\) takes all modes into account, but still tends to underestimate variance.

\[…\]the authors suggest an objective using what they call the Langevin-Stein Operator which does not make use of the proposal density \(q\) at all but uses test functions exclusively. The only requirement is that we be able to draw samples from the proposal. The authors claim that assuming access to \(q\) limits applicability of an objective/operator. This claim is not substantiated however.

## Tools

## refs

- AbDH16: Abbasnejad, E., Dick, A., & Hengel, A. van den. (2016) Infinite Variational Autoencoder for Semi-Supervised Learning.In Advances in Neural Information Processing Systems 29.
- Beal03: Beal, M. J. (2003) Variational algorithms for approximate Bayesian inference. University of London
- Bish94: Bishop, C. (1994) Mixture Density Networks.
*Microsoft Research*. - BlKM16: Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2016) Variational Inference: A Review for Statisticians.
*arXiv:1601.00670 [Cs, Stat]*. - FSPW16: Fraccaro, M., Sø nderby, S. ren K., Paquet, U., & Winther, O. (2016) Sequential Neural Models with Stochastic Layers.In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2199–2207). Curran Associates, Inc.
- FrJo05: Frey, B. J., & Jojic, N. (2005) A comparison of algorithms for inference and learning in probabilistic graphical models.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 27(9), 1392–1416. DOI - GaNe06: Gagen, M. J., & Nemoto, K. (2006) Variational optimization of probability measure spaces resolves the chain store paradox.
- GaWi14: Gal, Y., & van der Wilk, M. (2014) Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial.
*arXiv:1402.1412 [Stat]*. - GDGR15: Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation.
*arXiv:1502.04623 [Cs]*. - HBWP12: Hoffman, M., Blei, D. M., Wang, C., & Paisley, J. (2012) Stochastic Variational Inference.
*arXiv:1206.7051 [Cs, Stat]*. - JGJS99: Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999) An Introduction to Variational Methods for Graphical Models.
*Machine Learning*, 37(2), 183–233. DOI - KiSW16: Kingma, D. P., Salimans, T., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow.
*arXiv:1606.04934 [Cs, Stat]*. - KiWe13: Kingma, D. P., & Welling, M. (2013) Auto-Encoding Variational Bayes.
*arXiv:1312.6114 [Cs, Stat]*. - LSLW15: Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric.
*arXiv:1512.09300 [Cs, Stat]*. - LoWe16: Louizos, C., & Welling, M. (2016) Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.
*arXiv Preprint arXiv:1603.04733*. - Luts15: Luts, J. (2015) Real-Time Semiparametric Regression for Distributed Data Sets.
*IEEE Transactions on Knowledge and Data Engineering*, 27(2), 545–557. DOI - LuBW14: Luts, J., Broderick, T., & Wand, M. P. (2014) Real-Time Semiparametric Regression.
*Journal of Computational and Graphical Statistics*, 23(3), 589–615. DOI - Mack02a: MacKay, D. J. C. (2002a) Gaussian Processes.In Information Theory, Inference & Learning Algorithms (p. Chapter 45). Cambridge University Press
- Mack02b: MacKay, D. J. C. (2002b)
*Information Theory, Inference & Learning Algorithms*. Cambridge University Press - Mink01: Minka, T. P. (2001) Expectation Propagation for Approximate Bayesian Inference.In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (pp. 362–369). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- MoAV17: Molchanov, D., Ashukha, A., & Vetrov, D. (2017) Variational Dropout Sparsifies Deep Neural Networks.
*arXiv:1701.05369 [Cs, Stat]*. - OrWa10: Ormerod, J. T., & Wand, M. P. (2010) Explaining Variational Approximations.
*The American Statistician*, 64(2), 140–153. DOI - PSCP16: Pereyra, M., Schniter, P., Chouzenoux, É., Pesquet, J. C., Tourneret, J. Y., Hero, A. O., & McLaughlin, S. (2016) A Survey of Stochastic Simulation and Optimization Methods in Signal Processing.
*IEEE Journal of Selected Topics in Signal Processing*, 10(2), 224–241. DOI - RTAB16: Ranganath, R., Tran, D., Altosaar, J., & Blei, D. (2016) Operator Variational Inference.In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 496–504). Curran Associates, Inc.
- THSB17: Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M. (2017) Deep Probabilistic Programming.
*arXiv:1701.03757 [Cs, Stat]*. - WaJo08: Wainwright, M. J., & Jordan, M. I. (2008) Graphical models, exponential families, and variational inference.
*Foundations and Trends® in Machine Learning*, 1(1–2), 1–305. DOI - WaJo05: Wainwright, M., & Jordan, M. (2005) A variational principle for graphical models.In New Directions in Statistical Signal Processing (Vol. 155). MIT Press
- Wand16: Wand, M. P. (2016) Fast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing.
*arXiv Preprint arXiv:1602.07412*. - WiBi05: Winn, J. M., & Bishop, C. M. (2005) Variational message passing.In Journal of Machine Learning Research (pp. 661–694).
- XiJR03: Xing, E. P., Jordan, M. I., & Russell, S. (2003) A Generalized Mean Field Algorithm for Variational Inference in Exponential Families.In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 583–591). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.