The Living Thing / Notebooks : Stability (in learning)

Your estimate is robust to a deleted data point? it is a stable estimate. This implies generalizability, apparently. The above statewments can be made precise, I am told. Making them precise might give us new ideas for risk bounds, model selection, or connections to optimization.

Supposedly there is also connection to differential privacy, but since I don’t yet know anything about differential privacy I can’t take that statement any further except to note I would like to work it out one day.

This is also a connection to robust inference but I do not know enough to make this precise.

scrapbook:

Yu13:

Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to “reasonable” perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models.

Moritz Hardt, Stability as a foundation of machine learning:

Central to machine learning is our ability to relate how a learning algorithm fares on a sample to its performance on unseen instances. This is called generalization.

In this post, I will describe a purely algorithmic approach to generalization. The property that makes this possible is stability. An algorithm is stable, intuitively speaking, if its output doesn’t change much if we perturb the input sample in a single point. We will see that this property by itself is necessary and sufficient for generalization.

Refs

AgNi09
Agarwal, S., & Niyogi, P. (2009) Generalization Bounds for Ranking Algorithms via Algorithmic Stability. In Journal of Machine Learning Research (Vol. 10, pp. 441–474).
BoEl01
Bousquet, O., & Elisseeff, A. (2001) Algorithmic Stability and Generalization Performance. In Advances in Neural Information Processing Systems (Vol. 13, pp. 196–202). MIT Press
BoEl02
Bousquet, O., & Elisseeff, A. (2002) Stability and generalization. Journal of Machine Learning Research, 2(Mar), 499–526.
FrYL06
Freeman, R. A., Yang, P., & Lynch, K. M.(2006) Stability and Convergence Properties of Dynamic Average Consensus Estimators. In 2006 45th IEEE Conference on Decision and Control (pp. 338–343). DOI.
GiSB14
Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [Cs, Math, Stat].
KuNi02
Kutin, S., & Niyogi, P. (2002) Almost-everywhere Algorithmic Stability and Generalization Error. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (pp. 275–282). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Lian14
Liang, P. (2014) CS229T/STAT231: Statistical Learning Theory (Winter 2014).
LiRW10
Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
MWCW16
Meng, Q., Wang, Y., Chen, W., Wang, T., Ma, Z.-M., & Liu, T.-Y. (2016) Generalization Error Bounds for Optimization Algorithms via Stability. In arXiv:1609.08397 [stat] (Vol. 10, pp. 441–474).
XuCM12
Xu, H., Caramanis, C., & Mannor, S. (2012) Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 187–193. DOI.
Yu13
Yu, B. (2013) Stability. Bernoulli, 19(4), 1484–1500. DOI.