Gradient descent, a classic first order method, with many variants, and many things one might wish to understand.
There are only few things I wish to understand for the moment
How and when does it work? and how well?
Moritz Hardt, The zen of gradient descent explains it through Chebychev polynomials
Sebastian Bubeck exmplains it from a different angle, Revisiting Nesterov’s Acceleration
to expand upon the rather magical introduction given in his lecture
Wibisono et al explain it in terms of