Optimisation, gradient descent.

First order of business

Gradient descent, a classic first order method, with many variants, and many things one might wish to understand.

There are only few things I wish to understand for the moment

Nesterov acceleration

How and when does it work? and how well? Moritz Hardt, The zen of gradient descent explains it through Chebychev polynomials . Sebastian Bubeck exmplains it from a different angle, Revisiting Nesterov’s Acceleration to expand upon the rather magical introduction given in his lecture Wibisono et al explain it in terms of