The Living Thing / Notebooks :

Optimisation of model functions for experiment design

Surrogate optimisation methods for optimising optimisations

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪
Incompleteness: 🚧 🚧 🚧

Closely related is AutoML.

Problem statement

According to Gilles Louppe and Manoj Kumar:

We are interested in solving

\[x^* = \arg \min_x f(x)\]

under the constraints that

This is similar to the typical framing of reinforcement learning problems where there is a similar explore/exploit tradeoff, although I do not know the precise disciplinary boundaries that may transect these areas. They both might be thought of as stochastic optimal control problems.

The most common method seems to the “Bayesian optimisation”, which is typically based on Gaussian Process regressions. However, this is not a requirement, and many possible wacky regression models can give you the optimisation surrogate.

Of renewed interest for its use in hyperparameter/model selection, in e.g. regularising complex models, which is compactly referred to these days as automl.

You could also obviously use it in industrial process control, which is where I originally saw this kind of thing, in the form of sequential ANOVA design, which is an incredible idea itself, although that is now years old so is not nearly so hip. Since this effectively an attempt at optimal, nonlinear, heteroskedastic sequential ANOVA, I am led to wonder if we can dispense with ANOVA now. Does this stuff actually work well enough? Or is it the same thing, repackaged?

Implementation

Refs

Allen-Zhu, Zeyuan, Yuanzhi Li, Aarti Singh, and Yining Wang. 2017. “Near-Optimal Design of Experiments via Regret Minimization.” In PMLR, 126–35. http://proceedings.mlr.press/v70/allen-zhu17e.html.

Amos, Brandon, and J. Zico Kolter. 2017. “OptNet: Differentiable Optimization as a Layer in Neural Networks.” In PMLR, 136–45. http://proceedings.mlr.press/v70/amos17a.html.

Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. “Efficient and Robust Automated Machine Learning.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2962–70. Curran Associates, Inc. http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf.

Franceschi, Luca, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. 2017. “On Hyperparameter Optimization in Learning Systems.” In. https://arxiv.org/abs/1703.01785.

Gelbart, Michael A., Jasper Snoek, and Ryan P. Adams. 2014. “Bayesian Optimization with Unknown Constraints.” In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 250–59. UAI’14. Arlington, Virginia, United States: AUAI Press. http://hips.seas.harvard.edu/files/gelbart-constrained-uai-2014.pdf.

Grünewälder, Steffen, Jean-Yves Audibert, Manfred Opper, and John Shawe-Taylor. 2010. “Regret Bounds for Gaussian Process Bandit Problems.” In, 9:273–80. https://hal-enpc.archives-ouvertes.fr/hal-00654517/document.

Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. 2011. “Sequential Model-Based Optimization for General Algorithm Configuration.” In Learning and Intelligent Optimization, 6683:507–23. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25566-3_40.

Hutter, Frank, Holger Hoos, and Kevin Leyton-Brown. 2013. “An Evaluation of Sequential Model-Based Optimization for Expensive Blackbox Functions.” In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, 1209–16. GECCO ’13 Companion. New York, NY, USA: ACM. https://doi.org/10.1145/2464576.2501592.

Li, Lisha, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization,” March. http://arxiv.org/abs/1603.06560.

Močkus, J. 1975. “On Bayesian Methods for Seeking the Extremum.” In Optimization Techniques IFIP Technical Conference, edited by Prof Dr G. I. Marchuk, 400–404. Lecture Notes in Computer Science. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-38527-2_55.

Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Advances in Neural Information Processing Systems, 2951–9. Curran Associates, Inc. http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.

Snoek, Jasper, Kevin Swersky, Rich Zemel, and Ryan Adams. 2014. “Input Warping for Bayesian Optimization of Non-Stationary Functions.” In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 1674–82. http://www.jmlr.org/proceedings/papers/v32/snoek14.pdf.

Srinivas, Niranjan, Andreas Krause, Sham M. Kakade, and Matthias Seeger. 2012. “Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design.” IEEE Transactions on Information Theory 58 (5): 3250–65. https://doi.org/10.1109/TIT.2011.2182033.

Staines, Joe, and David Barber. 2012. “Variational Optimization,” December. http://arxiv.org/abs/1212.4507.

Swersky, Kevin, Jasper Snoek, and Ryan P Adams. 2013. “Multi-Task Bayesian Optimization.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 2004–12. Curran Associates, Inc. http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf.