Problem statement
According to Gilles Louppe and Manoj Kumar:
We are interested in solving
\begin{equation*} x^* = \arg \min_x f(x) \end{equation*}under the constraints that
 \(f\) is a black box for which no closed form is known (nor its gradients);
 \(f\) is expensive to evaluate;
 evaluations of \(y=f(x)\) may be noisy.
This is similar to the typical framing of reinforcement learning problems where there is a similar explore/exploit tradeoff, although I do not know the precise disciplinary boundaries that may transect these areas. They both might be thought of as stochastic optimal control problems.
The most common method seems to the “Bayesian optimisation”, which is based on Gaussian Process regressions. However, this is not a requirement, and many possible wacky regression models can give you the surrogate.
Of renewed interest for its use in hyperparameter/model selection, in e.g. regularising complex models. You could also obviously use it in industrial process control, which is where I originally saw this in the form of sequential ANOVA design.
Ben Recht: Random search is competitive with highly tuned bayesian methods in hyperparameter tuning.
Since this effectively an attempt at optimal, nonlinear ANOVA, I am led to wonder if we can dispense with ANOVA now. Does this stuff actually work well enough?
Implementations
skopt (aka scikitoptimize)
[…]is a simple and efficient library to minimize (very) expensive and noisy blackbox functions. It implements several methods for sequential modelbased optimization.

Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper (SnLA12).
The code consists of several parts. It is designed to be modular to allow swapping out various ‘driver’ and ‘chooser’ modules. The ‘chooser’ modules are implementations of acquisition functions such as expected improvement, UCB or random. The drivers determine how experiments are distributed and run on the system. As the code is designed to run experiments in parallel (spawning a new experiment as soon a result comes in), this requires some engineering.
Spearmint2 is similar, but more recently updated and fancier; however it has a restrictive license prohibiting wide redistribution without the payment of fees. You may or may not wish to trust the implied level of development and support of 4 Harvard Professors depending on your application.
Both of the Spearmint options (especially the latter) have opinionated choices of technology stack in order to do their optimizations, which means tehyr can do more work for you, but require more setup, than a simple little thing like skopt. Depending on your computing environment this might be an overall plus or a minus.
SMAC (AGPLv3)
(sequential modelbased algorithm configuration) is a versatile tool for optimizing algorithm parameters (or the parameters of some other process we can run automatically, or a function we can evaluate, such as a simulation).
SMAC has helped us speed up both local search and tree search algorithms by orders of magnitude on certain instance distributions. Recently, we have also found it to be very effective for the hyperparameter optimization of machine learning algorithms, scaling better to high dimensions and discrete input dimensions than other algorithms. Finally, the predictive models SMAC is based on can also capture and exploit important information about the model domain, such as which input variables are most important.
We hope you find SMAC similarly useful. Ultimately, we hope that it helps algorithm designers focus on tasks that are more scientifically valuable than parameter tuning.
python interface through pysmac.

is a Python library for optimizing over awkward search spaces with realvalued, discrete, and conditional dimensions.
Currently two algorithms are implemented in hyperopt:
 Random Search
 Tree of Parzen Estimators (TPE)
Hyperopt has been designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees, but these are not currently implemented.
All algorithms can be run either serially, or in parallel by communicating via MongoDB.
Refs
 FKES15
 Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015) Efficient and Robust Automated Machine Learning. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 2962–2970). Curran Associates, Inc.
 GeSA14
 Gelbart, M. A., Snoek, J., & Adams, R. P.(2014) Bayesian Optimization with Unknown Constraints. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (pp. 250–259). Arlington, Virginia, United States: AUAI Press
 GAOS10
 Grünewälder, S., Audibert, J.Y., Opper, M., & ShaweTaylor, J. (2010) Regret Bounds for Gaussian Process Bandit Problems. (Vol. 9, pp. 273–280). Presented at the AISTATS 2010  Thirteenth International Conference on Artificial Intelligence and Statistics
 IaMS00
 Ian Dewancker, Michael McCourt, & Scott Clark. (n.d.) Bayesian Optimization Primer.
 LJDR16
 Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016) Hyperband: A Novel BanditBased Approach to Hyperparameter Optimization. ArXiv:1603.06560 [Cs, Stat].
 Močk75
 Močkus, J. (1975) On Bayesian Methods for Seeking the Extremum. In P. D. G. I. Marchuk (Ed.), Optimization Techniques IFIP Technical Conference (pp. 400–404). Springer Berlin Heidelberg DOI.
 SnLA12
 Snoek, J., Larochelle, H., & Adams, R. P.(2012) Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951–2959). Curran Associates, Inc.
 SSZA14
 Snoek, J., Swersky, K., Zemel, R., & Adams, R. (2014) Input Warping for Bayesian Optimization of NonStationary Functions. In Proceedings of the 31st International Conference on Machine Learning (ICML14) (pp. 1674–1682).
 SKKS12
 Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. (2012) Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. IEEE Transactions on Information Theory, 58(5), 3250–3265. DOI.
 SwSA13
 Swersky, K., Snoek, J., & Adams, R. P.(2013) MultiTask Bayesian Optimization. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 2004–2012). Curran Associates, Inc.