The subfield of optimisation that specifically aims to automate model selection in machine learning. (and also occasionally ensemble construction)
Quoc Le & Barret Zoph weigh in for google:
Typically, our machine learning models are painstakingly designed by a team of engineers and scientists. This process of manually designing machine learning models is difficult because the search space of all possible models can be combinatorially large — a typical 10layer network can have ~1010 candidate networks! […]
To make this process of designing machine learning models much more accessible, we’ve been exploring ways to automate the design of machine learning models. […] in this blog post, we’ll focus on our reinforcement learning approach and the early results we’ve gotten so far.
In our approach (which we call “AutoML”), a controller neural net can propose a “child” model architecture, which can then be trained and evaluated for quality on a particular task. That feedback is then used to inform the controller how to improve its proposals for the next round.
Should you bother getting fancy about this? Ben Recht argues no, that random search is competitive with highly tuned Bayesian methods in hyperparameter tuning. Let’s ignore him for a moment though and sniff in the hype.
Differentiable hyperparameter optimisation
Each metaiteration runs an entire training run of stochastic gradient de scent to optimize elementary parameters (weights 1 and 2). Gradients of the validation loss with respect to hyperparameters are then computed by propagating gradients back through the elementary training iterations. Hyperparameters (in this case, learning rate and momentum schedules) are then updated in the direction of this hypergradient. … The last remaining parameter to SGD is the initial parameter vector. Treating this vector as a hyperparameter blurs the distinction between learning and metalearning. In the extreme case where all elementary learning rates are set to zero, the training set ceases to matter and the metalearning procedure exactly reduces to elementary learning on the validation set. Due to philosophical vertigo, we chose not to optimize the initial parameter vector.
Their implementation, hypergrad, is cool, but no longer maintained.
Bayesian/surrogate optimisation
Implementations
autosklearn, The practical implementation of hyperparameter optimization by FKES15:
autosklearn
is an automated machine learning toolkit and a dropin replacement for a scikitlearn estimator:import autosklearn.classification cls = autosklearn.classification.AutoSklearnClassifier() cls.fit(X_train, y_train) predictions = cls.predict(X_test)
autosklearn
frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, metalearning and ensemble construction.
is a hyperparameter tuning method based on automatic differentiation,[…] DrMAD can tune thousands of continuous hyperparameters (e.g. L1 norms for every single neuron) for deep models on GPUs…
[drmad distills] the knowledge of the forward pass into a shortcut path, through which we approximately reverse the training trajectory. When run on CPUs, DrMAD is at least 45 times faster and consumes 100 times less memory compared to stateoftheart methods for optimizing hyperparameters with almost no compromise to its effectiveness.
AFAICT this is the only one shipping an actual gradientbased hyperparameter optimisation method that uses hyperparameter gradients – the rest infer a proxy function for hyperparameter choice. Paper: FLFL16.
skopt (aka
scikitoptimize
)[…]is a simple and efficient library to minimize (very) expensive and noisy blackbox functions. It implements several methods for sequential modelbased optimization.

Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper (SnLA12).
The code consists of several parts. It is designed to be modular to allow swapping out various ‘driver’ and ‘chooser’ modules. The ‘chooser’ modules are implementations of acquisition functions such as expected improvement, UCB or random. The drivers determine how experiments are distributed and run on the system. As the code is designed to run experiments in parallel (spawning a new experiment as soon a result comes in), this requires some engineering.
Spearmint2 is similar, but more recently updated and fancier; however it has a restrictive license prohibiting wide redistribution without the payment of fees. You may or may not wish to trust the implied level of development and support of 4 Harvard Professors, depending on your application.
Both of the Spearmint options (especially the latter) have opinionated choices of technology stack in order to do their optimizations, which means they can do more work for you, but require more setup, than a simple little thing like
skopt
. Depending on your computing environment this might be an overall plus or a minus. SMAC (AGPLv3)
(sequential modelbased algorithm configuration) is a versatile tool for optimizing algorithm parameters (or the parameters of some other process we can run automatically, or a function we can evaluate, such as a simulation).
SMAC has helped us speed up both local search and tree search algorithms by orders of magnitude on certain instance distributions. Recently, we have also found it to be very effective for the hyperparameter optimization of machine learning algorithms, scaling better to high dimensions and discrete input dimensions than other algorithms. Finally, the predictive models SMAC is based on can also capture and exploit important information about the model domain, such as which input variables are most important.
We hope you find SMAC similarly useful. Ultimately, we hope that it helps algorithm designers focus on tasks that are more scientifically valuable than parameter tuning.
Python interface through pysmac.

is a Python library for optimizing over awkward search spaces with realvalued, discrete, and conditional dimensions.
Currently two algorithms are implemented in hyperopt:
 Random Search
 Tree of Parzen Estimators (TPE)
Hyperopt has been designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees, but these are not currently implemented.
All algorithms can be run either serially, or in parallel by communicating via MongoDB.

Won the landgrab for the name
automl
A quick overview of buzzwords, this project automates:
 Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you’re trying to predict).
 Feature Engineering (particularly around dates, and soon, NLP).
 Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse matrices).
 Feature Selection (picking only the features that actually prove useful).
 Data formatting (turning a list of dictionaries into a sparse matrix, onehot encoding categorical variables, taking the natural log of y for regression problems).
 Model Selection (which model works best for your problem).
 Hyperparameter Optimization (what hyperparameters work best for that model).
 Ensembling Subpredictors (automatically training up models to predict smaller problems within the meta problem).
 Ensembling Weak Estimators (automatically training up weak models on the larger problem itself, to inform the metaestimator’s decision).
Refs
 AbRa07: Ahmed AbdelGawad, Simon Ratner (2007) Adaptive optimization of hyperparameters in L2regularised logistic regression
 BBBK11: James S. Bergstra, Rémi Bardenet, Yoshua Bengio, Balázs Kégl (2011) Algorithms for hyperparameter optimization. In Advances in Neural Information Processing Systems (pp. 2546–2554). Curran Associates, Inc.
 HuHL13: Frank Hutter, Holger Hoos, Kevin LeytonBrown (2013) An Evaluation of Sequential Modelbased Optimization for Expensive Blackbox Functions. In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation (pp. 1209–1216). New York, NY, USA: ACM DOI
 THHL13: Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin LeytonBrown (2013) AutoWEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 847–855). New York, NY, USA: ACM DOI
 GeSA14: Michael A. Gelbart, Jasper Snoek, Ryan P. Adams (2014) Bayesian Optimization with Unknown Constraints. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (pp. 250–259). Arlington, Virginia, United States: AUAI Press
 LiSY18: Hanxiao Liu, Karen Simonyan, Yiming Yang (2018) DARTS: Differentiable Architecture Search. ArXiv:1806.09055 [Cs, Stat].
 FLFL16: Jie Fu, Hongyin Luo, Jiashi Feng, Kian Hsiang Low, TatSeng Chua (2016) DrMAD: Distilling ReverseMode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks. In PRoceedings of IJCAI, 2016.
 FKES15: Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, Frank Hutter (2015) Efficient and Robust Automated Machine Learning. In Advances in Neural Information Processing Systems 28 (pp. 2962–2970). Curran Associates, Inc.
 FoDN08: Chuansheng Foo, Chuong B. Do, Andrew Y. Ng (2008) Efficient multiple hyperparameter learning for loglinear models. In Advances in Neural Information Processing Systems 20 (pp. 377–384). Curran Associates, Inc.
 SKKS12: Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias Seeger (2012) Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. IEEE Transactions on Information Theory, 58(5), 3250–3265. DOI
 Domk12: Justin Domke (2012) Generic Methods for OptimizationBased Modeling. In International Conference on Artificial Intelligence and Statistics (pp. 318–326).
 EiNo99: R. Eigenmann, J. A. Nossek (1999) Gradient based adaptive regularization. In Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468) (pp. 87–94). DOI
 MaDA15: Dougal Maclaurin, David K. Duvenaud, Ryan P. Adams (2015) Gradientbased Hyperparameter Optimization through Reversible Learning. In ICML (pp. 2113–2122).
 Beng00: Yoshua Bengio (2000) GradientBased Optimization of Hyperparameters. Neural Computation, 12(8), 1889–1900. DOI
 LJDR16: Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar (2016) Hyperband: A Novel BanditBased Approach to Hyperparameter Optimization. ArXiv:1603.06560 [Cs, Stat].
 SSZA14: Jasper Snoek, Kevin Swersky, Rich Zemel, Ryan Adams (2014) Input Warping for Bayesian Optimization of NonStationary Functions. In Proceedings of the 31st International Conference on Machine Learning (ICML14) (pp. 1674–1682).
 SaKW15: Tim Salimans, Diederik Kingma, Max Welling (2015) Markov chain monte carlo and variational inference: Bridging the gap. In Proceedings of the 32nd International Conference on Machine Learning (ICML15) (pp. 1218–1226). Lille, France: JMLR.org
 SwSA13: Kevin Swersky, Jasper Snoek, Ryan P Adams (2013) MultiTask Bayesian Optimization. In Advances in Neural Information Processing Systems 26 (pp. 2004–2012). Curran Associates, Inc.
 Močk75: J. Močkus (1975) On Bayesian Methods for Seeking the Extremum. In Optimization Techniques IFIP Technical Conference (pp. 400–404). Springer Berlin Heidelberg DOI
 SnLA12: Jasper Snoek, Hugo Larochelle, Ryan P. Adams (2012) Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951–2959). Curran Associates, Inc.
 GAOS10: Steffen Grünewälder, JeanYves Audibert, Manfred Opper, John ShaweTaylor (2010) Regret Bounds for Gaussian Process Bandit Problems. (Vol. 9, pp. 273–280).
 HuHL11: Frank Hutter, Holger H. Hoos, Kevin LeytonBrown (2011) Sequential ModelBased Optimization for General Algorithm Configuration. In Learning and Intelligent Optimization (Vol. 6683, pp. 507–523). Berlin, Heidelberg: Springer, Berlin, Heidelberg DOI
 EFHB00: Katharina Eggensperger, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger H. Hoos, Kevin LeytonBrown (n.d.) Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters.