:completeness: 0 Status: draft
I’m working through a small realisation, for my own interest, which has been helpful in my understanding of variational Bayes, - indeed in fact, relating it to non-Bayes variational inference and
I doubt this realisation is novel, but I will work through it as if it is for the sake of my own education.
See also mixture models, probabilistic deep learning, directed graphical models, and note that lots of the software to do this is field under Bayesian Statistics HOWTO.
Sufficient statistics in exponential families
Famously, Maximum Likelihood estimators of exponential family models are highly compressible, in that these have sufficient statistics - these are low-dimensional functions of the data which characterise all the information in the complete data with respect to the parameter estimates. Many models and data sets and estimation methods do NOT have this feature, even parametric models with very few parameters.
Even for exponential families, the likelihood function is key, here. There are many other probability metrics which could be used, and which we do not expect to produce nice wossnames.
This can be a PITA when your data is very big and you wish to get benefit from that, and yet you can’t fit the data in memory; The question then arises - when can I do better? Can I find a “nearly sufficient” statistic, which is smaller than my data and yet does not worsen my error substantially? Can I quantify this nearness to the original?