Nothing here yet, because I don’t know anything about it.
But I would like to know a principled way of selecting models with very very large data sets, i.e. ones where you have to resort to approximate fits by SGD rather than doing full design matrix calcluations. I’m assuming that cross-validation is a-priori too expensive in this situation, but approximate optimisation is tractable. Presumably there are extra terms in information criteria then? Presumably you can appeal to large-sample theory guiltlessly, though. Can we do this in distributed statistics? Or regularising neural nets? Maybe there are tractable statistical learning theoretic results?