The Living Thing / Notebooks : Recurrent neural networks

Feedback neural networks structured to have memory and a notion of “current” and “past” states, which can encode time (or whatever).

The connection with these and convolutional neural networks is suggestive for the same reason.

Many different flavours and topologies. GridRNN - KaDG15 - seems natural for note-based stuff, and for systems, like the cochlea, and indeed the entire human visual apparatus, with dependencies in both space and time.

As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of interesting DSP applications, e.g. generative music.

Flavours

Linear

As seen in normal signal processing/ The main problem here is that they are unstable in the training phase in many of the wild and weird NN SGD phases, unless you are clever. See BeSF94. The next three types are proposed solutions for that.

Long Short Term Memory (LSTM)

On the border with deep automata.

As always, Christopher Olah wins the visual explanation prize: Understanding LSTM Networks. Also neat: LSTM Networks for Sentiment Analysis: Alex Graves Generating Sequences With Recurrent Neural Networks, generates handwriting.

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.

Gate Recurrent Unit (GRU)

Simpler than the LSTM. But how exactly does it work? Read CGCB15 and CMGB14.

Unitary

Charming connection with my other research into acoustics, what I would call “Gerzon allpass” filters are now hip to train neural networks.

Here is an implementation that uses unnecessary complex numbers.

Probabilistic

I.e. Kalman fiters, but rebranded in the fine neural networks tradition of taking something uncontroversial from another field and putting the word “neural” in front.

See Sequential Neural Models with Stochastic Layers. (FSPW16)

GridRNN

A mini-genre. KaDG15 et al connect recurrent cells across multiple axes, leading to a higher-rank MIMO system; This is natural in many kinds of spatial random fields, and I am amazed it was uncommon enough to need formalizing in a paper; but it was and it did and good on Kalchbrenner et al.

Phased

Long story bro

keras implementation by Francesco Ferroni Tensorlfow implementation by Enea Ceolini

Lasagne implementation by Danny Neil

Other

It’s still the wild west. Invent a category, name it and stake a claim. There’s publications in them thar hills.

Practicalities

See tensorflow.

Refs

ASKR12
Arisoy, E., Sainath, T. N., Kingsbury, B., & Ramabhadran, B. (2012) Deep Neural Network Language Models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT (pp. 20–28). Stroudsburg, PA, USA: Association for Computational Linguistics
ArSB15
Arjovsky, M., Shah, A., & Bengio, Y. (2015) Unitary Evolution Recurrent Neural Networks. arXiv:1511.06464 [Cs, Stat].
AuBM08
Auer, P., Burgsteiner, H., & Maass, W. (2008) A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Networks, 21(5), 786–795. DOI.
BeSF94
Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
BoBV12
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
BoLe06
Bown, O., & Lexer, S. (2006) Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance. In F. Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, … H. Takagi (Eds.), Applications of Evolutionary Computing (pp. 652–663). Springer Berlin Heidelberg
BuMe05
Buhusi, C. V., & Meck, W. H.(2005) What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews Neuroscience, 6(10), 755–765. DOI.
ChYR16
Charles, A., Yin, D., & Rozell, C. (2016) Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks. arXiv:1605.08346 [Cs, Math, Stat].
CMBB14
Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv Preprint arXiv:1409.1259.
CMGB14
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014) Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv:1406.1078 [Cs, Stat].
ChAB16
Chung, J., Ahn, S., & Bengio, Y. (2016) Hierarchical Multiscale Recurrent Neural Networks. arXiv:1609.01704 [Cs].
CGCB14
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS.
CGCB15
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2015) Gated Feedback Recurrent Neural Networks. arXiv:1502.02367 [Cs, Stat].
CoSS16
Collins, J., Sohl-Dickstein, J., & Sussillo, D. (2016) Capacity and Trainability in Recurrent Neural Networks. arXiv:1611.09913 [Cs, Stat].
DaYO16
Dasgupta, S., Yoshizumi, T., & Osogami, T. (2016) Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences. arXiv:1610.01989 [Cs, Stat].
DoPo15
Doelling, K. B., & Poeppel, D. (2015) Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. DOI.
Elma90
Elman, J. L.(1990) Finding structure in time. Cognitive Science, 14, 179–211. DOI.
FSPW16
Fraccaro, M., Sønderby, S. K., Paquet, U., & Winther, O. (2016) Sequential Neural Models with Stochastic Layers. arXiv:1605.07571 [Cs, Stat].
Gal15
Gal, Y. (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [Stat].
GeLR16
Gel, Y. R., Lyubchich, V., & Ramirez, L. L.(2016) Fast Patchwork Bootstrap for Quantifying Estimation Uncertainties in Sparse Random Networks.
GeSC00
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000) Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471. DOI.
GDGR15
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. arXiv:1502.04623 [Cs].
GCWK09
Grzyb, B. J., Chinellato, E., Wojcik, G. M., & Kaminski, W. A.(2009) Which model to use for the Liquid State Machine?. In 2009 International Joint Conference on Neural Networks (pp. 1018–1024). DOI.
HaMa12
Hazan, H., & Manevitz, L. M.(2012) Topological constraints and robustness in liquid state machines. Expert Systems with Applications, 39(2), 1597–1606. DOI.
HeWH16
He, K., Wang, Y., & Hopcroft, J. (2016) A Powerful Generative Model Using Random Weights for the Deep Image Representation. arXiv:1606.04801 [Cs].
HDYD12
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
HoSc97
Hochreiter, S., & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. DOI.
JSDP16
Jing, L., Shen, Y., Dubček, T., Peurifoy, J., Skirlo, S., Tegmark, M., & Soljačić, M. (2016) Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN. arXiv:1612.05231 [Cs, Stat].
JoZS15
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015) An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 2342–2350).
KaDG15
Kalchbrenner, N., Danihelka, I., & Graves, A. (2015) Grid Long Short-Term Memory. arXiv:1507.01526 [Cs].
KaJF15
Karpathy, A., Johnson, J., & Fei-Fei, L. (2015) Visualizing and Understanding Recurrent Networks. arXiv:1506.02078 [Cs].
Lecu98
LeCun, Y. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. DOI.
LeNM05
Legenstein, R., Naeger, C., & Maass, W. (2005) What Can a Neuron Learn with Spike-Timing-Dependent Plasticity?. Neural Computation, 17(11), 2337–2382. DOI.
LiBE15
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015) A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv:1506.00019 [Cs].
LuJa09
Lukoševičius, M., & Jaeger, H. (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127–149. DOI.
MaNM04
Maass, W., Natschläger, T., & Markram, H. (2004) Computational Models for Generic Cortical Microcircuits. In Computational Neuroscience: A Comprehensive Approach (pp. 575–605). Chapman & Hall/CRC
MHRB16
Mhammedi, Z., Hellicar, A., Rahman, A., & Bailey, J. (2016) Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections. arXiv:1612.00188 [Cs].
MKBČ10
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., & Khudanpur, S. (2010) Recurrent Neural Network Based Language Model. In Eleventh Annual Conference of the International Speech Communication Association.
Mnih15
Mnih, V. (2015) Human-level control through deep reinforcement learning. Nature, 518, 529–533. DOI.
MoDH12
Mohamed, A. r, Dahl, G. E., & Hinton, G. (2012) Acoustic Modeling Using Deep Belief Networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22. DOI.
MoRe12
Monner, D., & Reggia, J. A.(2012) A generalized LSTM-like training algorithm for second-order recurrent neural networks. Neural Networks, 25, 70–83. DOI.
NePL16
Neil, D., Pfeiffer, M., & Liu, S.-C. (2016) Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3882–3890). Curran Associates, Inc.
NWSS16
Neyshabur, B., Wu, Y., Salakhutdinov, R. R., & Srebro, N. (2016) Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3477–3485). Curran Associates, Inc.
NCRG16
Nussbaum-Thom, M., Cui, J., Ramabhadran, B., & Goel, V. (2016) Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units. (pp. 390–394). DOI.
OIMT15
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. arXiv:1512.08512 [Cs].
PaHC15
Patraucean, V., Handa, A., & Cipolla, R. (2015) Spatio-temporal video autoencoder with differentiable memory. arXiv:1511.06309 [Cs].
RaSP16
Ravanbakhsh, S., Schneider, J., & Poczos, B. (2016) Deep Learning with Sets and Point Clouds. arXiv:1611.04500 [Cs, Stat].
RoRS15
Rohrbach, A., Rohrbach, M., & Schiele, B. (2015) The Long-Short Story of Movie Description. arXiv:1506.01698 [Cs].
Stei04
Steil, J. J.(2004) Backpropagation-decorrelation: online recurrent learning with O(N) complexity. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 843–848 vol.2). DOI.
SuPf16
Surace, S. C., & Pfister, J.-P. (2016) Online Maximum Likelihood Estimation of the Parameters of Partially Observed Diffusion Processes.
TaHR06
Taylor, G. W., Hinton, G. E., & Roweis, S. T.(2006) Modeling human motion using binary latent variables. In Advances in neural information processing systems (pp. 1345–1352).
ThBe15
Theis, L., & Bethge, M. (2015) Generative Image Modeling Using Spatial LSTMs. arXiv:1506.03478 [Cs, Stat].
VKCM15
Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., & Bengio, Y. (2015) ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv:1505.00393 [Cs].
Waib89
Waibel, A. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37(3), 328–339. DOI.
WPPA16
Wisdom, S., Powers, T., Pitton, J., & Atlas, L. (2016) Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery. In Advances in Neural Information Processing Systems 29.
WZZB16
Wu, Y., Zhang, S., Zhang, Y., Bengio, Y., & Salakhutdinov, R. R.(2016) On Multiplicative Integration with Recurrent Neural Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2856–2864). Curran Associates, Inc.
YTCB15
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015) Describing Videos by Exploiting Temporal Structure. arXiv:1502.08029 [Cs, Stat].