The Living Thing / Notebooks :

Recurrent neural networks

Feedback networks structured to have memory and a notion of “current” and “past” states, which can encode time (or whatever). Many wheels are re-invented with these, but the essential model is that we have a heavily nonlinear state filter inferred by gradient descent.

The connection with these and convolutional neural networks is suggestive for the same reason.

Many different flavours and topologies. GridRNN – KaDG15 – seems natural for note-based stuff, and for systems, like the cochlea, and indeed the entire human visual apparatus, with dependencies in both space and time.


As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of interesting DSP applications, e.g. generative music.

To Learn

Inverse Autoregressive Flow (KSJC16).



As seen in normal signal processing/ The main problem here is that they are unstable in the training phase in many of the wild and weird NN SGD phases, unless you are clever. See BeSF94. The next three types are proposed solutions for that.

Long Short Term Memory (LSTM)

On the border with deep automata.

As always, Christopher Olah wins the visual explanation prize: Understanding LSTM Networks. Also neat: LSTM Networks for Sentiment Analysis: Alex Graves Generating Sequences With Recurrent Neural Networks, generates handwriting.

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell.[…] A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.

Gate Recurrent Unit (GRU)

Simpler than the LSTM. But how exactly does it work? Read CGCB15 and CMGB14.


Charming connection with my other research into acoustics, what I would call “Gerzon allpass” filters are now hip for use neural networks because of favourable normalisation characteristics.

Here is an implementation that uses unnecessary complex numbers.


i.e. Kalman fiters, but rebranded in the fine neural networks tradition of taking something uncontroversial from another field and putting the word “neural” in front. Practically these are usually variational, but there are some random sampling based ones.

See Sequential Neural Models with Stochastic Layers. (FSPW16)


A mini-genre. KaDG15 et al connect recurrent cells across multiple axes, leading to a higher-rank MIMO system; This is natural in many kinds of spatial random fields, and I am amazed it was uncommon enough to need formalizing in a paper; but it was and it did and good on Kalchbrenner et al.


Long story, bro.

keras implementation by Francesco Ferroni. Tensorflow implementation by Enea Ceolini.

Lasagne implementation by Danny Neil.


It’s still the wild west. Invent a category, name it and stake a claim. There’s publications in them thar hills.



TBPTT, state filters, filter stability.


See tensorflow.