Ensemble methods; mixing predictions from simple learners to get sophisticated predictions.

Fast to train, fast to use. Gets you results. May not get you answers. So, like neural networks but from the previous hype cycle.

Jeremy Kun: Why Boosting Doesn’t Overfit:

Boosting, which we covered in gruesome detail previously, has a natural measure of complexity represented by the number of rounds you run the algorithm for. Each round adds one additional “weak learner” weighted vote. So running for a thousand rounds gives a vote of a thousand weak learners. Despite this, boosting doesn’t overfit on many datasets. In fact, and this is a shocking fact, researchers observed that Boosting would hit zero training error, they kept running it for more rounds, and the generalization error kept going down! It seemed like the complexity could grow arbitrarily without penalty. […] this phenomenon is a fact about voting schemes, not boosting in particular.

## Questions

In a different context, I’ve run into model averaging; How does this relate to voting algorithms?

How do you phrase ensemble algorithms in a Bayesian context? If it were Bayesian model averaging, this would be easy, but where the learners are all ill-posed?

## Randoms trees, forests, jungles

how to do machine vision using random forests brought to you by the folks behind Kinect.

## Implementations

## Refs

- FrHT00: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000) Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors).
*The Annals of Statistics*, 28(2), 337–407. DOI - Brei96: Leo Breiman (1996) Bagging predictors.
*Machine Learning*, 24(2), 123–140. DOI - SFBL98: Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee (1998) Boosting the margin: a new explanation for the effectiveness of voting methods.
*The Annals of Statistics*, 26(5), 1651–1686. DOI - GaLe13: J. Gall, V. Lempitsky (2013) Class-Specific Hough Forests for Object Detection. In Decision Forests for Computer Vision and Medical Image Analysis (pp. 143–157). Springer London
- ScBV14: Erwan Scornet, Gérard Biau, Jean-Philippe Vert (2014) Consistency of Random Forests.
*ArXiv:1405.2881 [Math, Stat]*. - CrSK12: Antonio Criminisi, Jamie Shotton, Ender Konukoglu (2012) Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning.
*Foundations and Trends® in Computer Graphics and Vision*, 7(2–3). DOI - CrSK11: A. Criminisi, J. Shotton, E. Konukoglu (2011) Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning (No. MSR-TR-2011-114). Microsoft Research
- SSKN13: Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi (2013) Decision Jungles: Compact and Rich Models for Classification. In NIPS.
- FCBA14: Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim (2014) Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?
*Journal of Machine Learning Research*, 15(1), 3133–3181. - Frie01: Jerome H. Friedman (2001) Greedy Function Approximation: A Gradient Boosting Machine.
*The Annals of Statistics*, 29(5), 1189–1232. - JoZh14: R. Johnson, Tong Zhang (2014) Learning Nonlinear Functions Using Regularized Greedy Forest.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 36(5), 942–954. DOI - LaRT14: Balaji Lakshminarayanan, Daniel M Roy, Yee Whye Teh (2014) Mondrian Forests: Efficient Online Random Forests. In Advances in Neural Information Processing Systems 27 (pp. 3140–3148). Curran Associates, Inc.
- Scor14: Erwan Scornet (2014) On the asymptotics of random forests.
*ArXiv:1409.2090 [Math, Stat]*. - BLTG06: Peter J. Bickel, Bo Li, Alexandre B. Tsybakov, Sara A. van de Geer, Bin Yu, Teófilo Valdés, … Aad van der Vaart (2006) Regularization in statistics.
*Test*, 15(2), 271–344. DOI - DíJM12: Carlos Díaz-Avalos, P. Juan, J. Mateu (2012) Similarity measures of conditional intensity functions to test separability in multidimensional point processes.
*Stochastic Environmental Research and Risk Assessment*, 27(5), 1193–1205. DOI - BüGe11: Peter Bühlmann, Sara van de Geer (2011)
*Statistics for High-Dimensional Data: Methods, Theory and Applications*. Heidelberg ; New York: Springer - Frie02: Jerome H. Friedman (2002) Stochastic gradient boosting.
*Computational Statistics & Data Analysis*, 38(4), 367–378. DOI - BLGR16: Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh (2016) The Mondrian Kernel.
*ArXiv:1606.05241 [Stat]*. - BaTe15: Matej Balog, Yee Whye Teh (2015) The Mondrian Process for Machine Learning.
*ArXiv:1507.05181 [Cs, Stat]*. - RaRe09: Ali Rahimi, Benjamin Recht (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.