The Living Thing / Notebooks :

Boosting, bagging, voting

Ensemble methods; mixing predictions from simple learners to get sophisticated predictions.

Fast to train, fast to use. Gets you results. May not get you answers. So, like neural networks but from the previous hype cycle.

Jeremy Kun: Why Boosting Doesn’t Overfit:

Boosting, which we covered in gruesome detail previously, has a natural measure of complexity represented by the number of rounds you run the algorithm for. Each round adds one additional “weak learner” weighted vote. So running for a thousand rounds gives a vote of a thousand weak learners. Despite this, boosting doesn’t overfit on many datasets. In fact, and this is a shocking fact, researchers observed that Boosting would hit zero training error, they kept running it for more rounds, and the generalization error kept going down! It seemed like the complexity could grow arbitrarily without penalty. […] this phenomenon is a fact about voting schemes, not boosting in particular.


  1. In a different context, I’ve run into model averaging; How does this relate to voting algorithms?
  2. How do you phrase ensemble algorithms in a Bayesian context? If it were Bayesian model averaging, this would be easy, but where the learners are all ill-posed?

Randoms trees, forests, jungles



Balog, M., Lakshminarayanan, B., Ghahramani, Z., Roy, D. M., & Teh, Y. W.(2016) The Mondrian Kernel. arXiv:1606.05241 [stat].
Balog, M., & Teh, Y. W.(2015) The Mondrian Process for Machine Learning. arXiv:1507.05181 [cs, Stat].
Bickel, P. J., Li, B., Tsybakov, A. B., van de Geer, S. A., Yu, B., Valdés, T., … Vaart, A. van der. (2006) Regularization in statistics. Test, 15(2), 271–344. DOI.
Breiman, L. (1996) Bagging predictors. Machine Learning, 24(2), 123–140. DOI.
Bühlmann, P., & van de Geer, S. (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications. (2011 edition.). Heidelberg ; New York: Springer
Criminisi, A., Shotton, J., & Konukoglu, E. (2011) Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning (No. MSR-TR-2011-114). . Microsoft Research
Criminisi, A., Shotton, J., & Konukoglu, E. (2012) Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning. Foundations and Trendsrm in Computer Graphics and Vision: Vol. 7: No 2-3, Pp 81-227, 7(2-3). DOI.
Díaz-Avalos, C., Juan, P., & Mateu, J. (2012) Similarity measures of conditional intensity functions to test separability in multidimensional point processes. Stochastic Environmental Research and Risk Assessment, 27(5), 1193–1205. DOI.
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014) Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?. Journal of Machine Learning Research, 15(1), 3133–3181.
Friedman, J. H.(2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232.
Friedman, J. H.(2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. DOI.
Friedman, J., Hastie, T., & Tibshirani, R. (2000) Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407. DOI.
Gall, J., & Lempitsky, V. (2013) Class-Specific Hough Forests for Object Detection. In A. Criminisi & J. Shotton (Eds.), Decision Forests for Computer Vision and Medical Image Analysis (pp. 143–157). Springer London
Johnson, R., & Zhang, T. (2014) Learning Nonlinear Functions Using Regularized Greedy Forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 942–954. DOI.
Lakshminarayanan, B., Roy, D. M., & Teh, Y. W.(2014) Mondrian Forests: Efficient Online Random Forests. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 3140–3148). Curran Associates, Inc.
Rahimi, A., & Recht, B. (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S.(1998) Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686. DOI.
Scornet, E. (2014) On the asymptotics of random forests. arXiv:1409.2090 [math, Stat].
Scornet, E., Biau, G., & Vert, J.-P. (2014) Consistency of Random Forests. arXiv:1405.2881 [math, Stat].
Shotton, J., Sharp, T., Kohli, P., Nowozin, S., Winn, J., & Criminisi, A. (2013) Decision Jungles: Compact and Rich Models for Classification. In Proc. NIPS.