Ensemble methods; mixing predictions from simple learners to get sophisticated predictions.
Fast to train, fast to use. Gets you results. May not get you answers. So, like neural networks but from the previous hype cycle.
Jeremy Kun: Why Boosting Doesn’t Overfit:
Boosting, which we covered in gruesome detail previously, has a natural measure of complexity represented by the number of rounds you run the algorithm for. Each round adds one additional “weak learner” weighted vote. So running for a thousand rounds gives a vote of a thousand weak learners. Despite this, boosting doesn’t overfit on many datasets. In fact, and this is a shocking fact, researchers observed that Boosting would hit zero training error, they kept running it for more rounds, and the generalization error kept going down! It seemed like the complexity could grow arbitrarily without penalty. […] this phenomenon is a fact about voting schemes, not boosting in particular.
Questions

In a different context, I’ve run into model averaging; How does this relate to voting algorithms?

How do you phrase ensemble algorithms in a Bayesian context? If it were Bayesian model averaging, this would be easy, but where the learners are all illposed?
Randoms trees, forests, jungles

how to do machine vision using random forests brought to you by the folks behind Kinect.
Implementations
Refs
 FrHT00: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000) Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407. DOI
 Brei96: Leo Breiman (1996) Bagging predictors. Machine Learning, 24(2), 123–140. DOI
 SFBL98: Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686. DOI
 GaLe13: J. Gall, V. Lempitsky (2013) ClassSpecific Hough Forests for Object Detection. In Decision Forests for Computer Vision and Medical Image Analysis (pp. 143–157). Springer London
 ScBV14: Erwan Scornet, Gérard Biau, JeanPhilippe Vert (2014) Consistency of Random Forests. ArXiv:1405.2881 [Math, Stat].
 CrSK12: Antonio Criminisi, Jamie Shotton, Ender Konukoglu (2012) Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and SemiSupervised Learning. Foundations and Trends® in Computer Graphics and Vision, 7(2–3). DOI
 CrSK11: A. Criminisi, J. Shotton, E. Konukoglu (2011) Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and SemiSupervised Learning (No. MSRTR2011114). Microsoft Research
 SSKN13: Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi (2013) Decision Jungles: Compact and Rich Models for Classification. In NIPS.
 FCBA14: Manuel FernándezDelgado, Eva Cernadas, Senén Barro, Dinani Amorim (2014) Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research, 15(1), 3133–3181.
 Frie01: Jerome H. Friedman (2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232.
 JoZh14: R. Johnson, Tong Zhang (2014) Learning Nonlinear Functions Using Regularized Greedy Forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 942–954. DOI
 LaRT14: Balaji Lakshminarayanan, Daniel M Roy, Yee Whye Teh (2014) Mondrian Forests: Efficient Online Random Forests. In Advances in Neural Information Processing Systems 27 (pp. 3140–3148). Curran Associates, Inc.
 Scor14: Erwan Scornet (2014) On the asymptotics of random forests. ArXiv:1409.2090 [Math, Stat].
 BLTG06: Peter J. Bickel, Bo Li, Alexandre B. Tsybakov, Sara A. van de Geer, Bin Yu, Teófilo Valdés, … Aad van der Vaart (2006) Regularization in statistics. Test, 15(2), 271–344. DOI
 DíJM12: Carlos DíazAvalos, P. Juan, J. Mateu (2012) Similarity measures of conditional intensity functions to test separability in multidimensional point processes. Stochastic Environmental Research and Risk Assessment, 27(5), 1193–1205. DOI
 BüGe11: Peter Bühlmann, Sara van de Geer (2011) Statistics for HighDimensional Data: Methods, Theory and Applications. Heidelberg ; New York: Springer
 Frie02: Jerome H. Friedman (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. DOI
 BLGR16: Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh (2016) The Mondrian Kernel. ArXiv:1606.05241 [Stat].
 BaTe15: Matej Balog, Yee Whye Teh (2015) The Mondrian Process for Machine Learning. ArXiv:1507.05181 [Cs, Stat].
 RaRe09: Ali Rahimi, Benjamin Recht (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.