The Living Thing / Notebooks :

Sequential surrogate optimisation

for expensive experiments and hyperparameter selection with the use of yet more hyperparameters

Problem statement

According to Gilles Louppe and Manoj Kumar:

We are interested in solving

\begin{equation*} x^* = \arg \min_x f(x) \end{equation*}

under the constraints that

  • \(f\) is a black box for which no closed form is known (nor its gradients);
  • \(f\) is expensive to evaluate;
  • evaluations of \(y=f(x)\) may be noisy.

This is similar to the typical framing of reinforcement learning problems where there is a similar explore/exploit tradeoff, although I do not know the precise disciplinary boundaries that may transect these areas. They both might be thought of as stochastic optimal control problems.

The most common method seems to the “Bayesian optimisation”, which is based on Gaussian Process regressions. However, this is not a requirement, and many possible wacky regression models can give you the surrogate.

Of renewed interest for its use in hyperparameter/model selection, in e.g. regularising complex models. You could also obviously use it in industrial process control, which is where I originally saw this in the form of sequential ANOVA design.

Ben Recht: Random search is competitive with highly tuned bayesian methods in hyperparameter tuning.

Since this effectively an attempt at optimal, nonlinear ANOVA, I am led to wonder if we can dispense with ANOVA now. Does this stuff actually work well enough?



Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015) Efficient and Robust Automated Machine Learning. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 2962–2970). Curran Associates, Inc.
Gelbart, M. A., Snoek, J., & Adams, R. P.(2014) Bayesian Optimization with Unknown Constraints. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (pp. 250–259). Arlington, Virginia, United States: AUAI Press
Grünewälder, S., Audibert, J.-Y., Opper, M., & Shawe-Taylor, J. (2010) Regret Bounds for Gaussian Process Bandit Problems. (Vol. 9, pp. 273–280). Presented at the AISTATS 2010 - Thirteenth International Conference on Artificial Intelligence and Statistics
Ian Dewancker, Michael McCourt, & Scott Clark. (n.d.) Bayesian Optimization Primer.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016) Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. ArXiv:1603.06560 [Cs, Stat].
Močkus, J. (1975) On Bayesian Methods for Seeking the Extremum. In P. D. G. I. Marchuk (Ed.), Optimization Techniques IFIP Technical Conference (pp. 400–404). Springer Berlin Heidelberg DOI.
Snoek, J., Larochelle, H., & Adams, R. P.(2012) Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951–2959). Curran Associates, Inc.
Snoek, J., Swersky, K., Zemel, R., & Adams, R. (2014) Input Warping for Bayesian Optimization of Non-Stationary Functions. In Proceedings of the 31st International Conference on Machine Learning (ICML-14) (pp. 1674–1682).
Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. (2012) Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. IEEE Transactions on Information Theory, 58(5), 3250–3265. DOI.
Swersky, K., Snoek, J., & Adams, R. P.(2013) Multi-Task Bayesian Optimization. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 2004–2012). Curran Associates, Inc.