The Living Thing / Notebooks : Boosting, bagging, voting

Ensemble methods. Fast to train, fast to use. Get you results. May not get you answers. So, like neural networks but you don’t need a server farm.

Jeremy kun: Why Boosting Doesn’t Overfit:

Boosting, which we covered in gruesome detail previously, has a natural measure of complexity represented by the number of rounds you run the algorithm for. Each round adds one additional “weak learner” weighted vote. So running for a thousand rounds gives a vote of a thousand weak learners. Despite this, boosting doesn’t overfit on many datasets. In fact, and this is a shocking fact, researchers observed that Boosting would hit zero training error, they kept running it for more rounds, and the generalization error kept going down! It seemed like the complexity could grow arbitrarily without penalty. […] this phenomenon is a fact about voting schemes, not boosting in particular.

Randoms trees, forests, jungles

Refs

BLGR16
Balog, M., Lakshminarayanan, B., Ghahramani, Z., Roy, D. M., & Teh, Y. W.(2016) The Mondrian Kernel. arXiv:1606.05241 [stat].
BaTe15
Balog, M., & Teh, Y. W.(2015) The Mondrian Process for Machine Learning. arXiv:1507.05181 [cs, Stat].
BLTG06
Bickel, P. J., Li, B., Tsybakov, A. B., van de Geer, S. A., Yu, B., Valdés, T., … Vaart, A. van der. (2006) Regularization in statistics. Test, 15(2), 271–344. DOI.
Brei96
Breiman, L. (1996) Bagging predictors. Machine Learning, 24(2), 123–140. DOI.
BüGe11
Bühlmann, P., & van de Geer, S. (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications. (2011 edition.). Heidelberg ; New York: Springer
CrSK11
Criminisi, A., Shotton, J., & Konukoglu, E. (2011) Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning (No. MSR-TR-2011-114). . Microsoft Research
CrSK12
Criminisi, A., Shotton, J., & Konukoglu, E. (2012) Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning. Foundations and Trendsrm in Computer Graphics and Vision: Vol. 7: No 2-3, Pp 81-227, 7(2-3). DOI.
DíJM12
Díaz-Avalos, C., Juan, P., & Mateu, J. (2012) Similarity measures of conditional intensity functions to test separability in multidimensional point processes. Stochastic Environmental Research and Risk Assessment, 27(5), 1193–1205. DOI.
FCBA14
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014) Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?. Journal of Machine Learning Research, 15(1), 3133–3181.
Frie01
Friedman, J. H.(2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232.
Frie02
Friedman, J. H.(2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. DOI.
FrHT00
Friedman, J., Hastie, T., & Tibshirani, R. (2000) Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407. DOI.
GaLe13
Gall, J., & Lempitsky, V. (2013) Class-Specific Hough Forests for Object Detection. In A. Criminisi & J. Shotton (Eds.), Decision Forests for Computer Vision and Medical Image Analysis (pp. 143–157). Springer London
JoZh14
Johnson, R., & Zhang, T. (2014) Learning Nonlinear Functions Using Regularized Greedy Forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 942–954. DOI.
LaRT14
Lakshminarayanan, B., Roy, D. M., & Teh, Y. W.(2014) Mondrian Forests: Efficient Online Random Forests. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 3140–3148). Curran Associates, Inc.
RaRe09
Rahimi, A., & Recht, B. (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
SFBL98
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S.(1998) Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686. DOI.
Scor14
Scornet, E. (2014) On the asymptotics of random forests. arXiv:1409.2090 [math, Stat].
ScBV14
Scornet, E., Biau, G., & Vert, J.-P. (2014) Consistency of Random Forests. arXiv:1405.2881 [math, Stat].
SSKN13
Shotton, J., Sharp, T., Kohli, P., Nowozin, S., Winn, J., & Criminisi, A. (2013) Decision Jungles: Compact and Rich Models for Classification. In Proc. NIPS.