Embarrassingly half-baked notes towards thepointof that damn statistical sufficiency that I was made to learn, and how it might be generalised (possibly has already been generalized) for modern data-sciencey purposes.

Famously, ML estimators of exponential family models are highly compressible, in that these have *sufficient statistics* - these are low-dimensional functions of the data which characterise all the informationin the complete data

(TODO - explicitness, and example, and why this makes asymptotics plausible)

Many models and data sets and estimation methods do NOT have this feature, even parametric models with very few parameters.

(TODO reprise related examples)

This can be a PITA when your data is very big and you wish to get benefit from that, and yet you can’t fit the data in memory; The question then arises - when can I do better? Can I find a “nearly sufficient” statistic, which is smaller than my data and yet does not worsen my error substantially? Can I quantify this nearness to the original?

There are a few approaches to this. I wonder if there is a good and consistent framing for them all?

Most prominent in my field at the moment is matrix sketching and the application randomised regression.

There are probably other data compression methods - certainly compressive sensing approaches this kind of problem if you squint at it right - it’s in the name - but it would need a little re-framing.

Question - what kind of sufficiencies do we have for causal graphical model inference? Causal sufficiencies, I mean.

TBC.