The Living Thing / Notebooks :

Why does deep learning work?

despite the fact we are totally just making this shit up

No time to frame this well, but there are a lot of versions of the question, so… pick one. The essential idea is that we say: Oh my, that deep learning model I just trained had terribly good performance compared with some simpler thing I tried. Can I make my model simpler and still get good results? Can I know a decent error bound? Can I learn anything about underlying system by looking at the parameters I learned?

And the answer is not “yes” in any satisfying general sense. Pfft.

The SGD fitting process looks a lot like simulated annealing and like there should be some nice explanation from the statistical mechanics of simulated annealing. But it’s not the same, so fire up the paper mill!

Proceed with caution, since there is a lot of messy thinking here. Here are some things I’d like to read, but whose inclusion here should not be taken as a recommendation. The common theme is using ideas from physics to understand deep learning and other directed graph learning methods.

Charles H Martin. Why Deep Learning Works II: the Renormalization Group.

Max Tegmark, argues that statistical mechanics provides inside to deep learning, and neuroscience (LiTe16a, LiTe16b), although there doesn’t seem to be much actionable there?

Natalie Wolchover summarises Mehta and Schwab (MeSc14).

Wiatowski et al, (WiGB17) and Shwartz-Ziv, and Tishby (ShTi17) argue that looking at neural networks as random fields with energy propagation dynamics provides some insight to how they work. More impressivley, IMO, Haber and Ruthotto argue you can improve NNs by looking at them as Hamiltonian ODEs.

There are other connections to - physics-driven annealing methods and physics-inspired Boltzmann machines etc. TBC. C&C statistical mechanics of statistics.

Refs