Modern computational neural network methods reascend the hype phase transition.
a.k.a *deep learning* or *extreme learning* or *double plus fancy brainbots* or
*please can our department have a bigger computation budget it’s not to play
video games i swear?*.

But what are these methods?

I would argue that they are, collectively, something closer to a convenient technology stack than a single theory.

To summarise, deep learning is

- a collection of incremental improvements in areas such as
Stochastic Gradient Descent,
approximation theory,
graphical models, and
signal processing research,
plus some handy advancements in
SIMD architectures
that taken together surprisingly elicit the
kind of results from machine learning that everyone was hoping we’d get by at
least 20 years ago, yet
*without*requiring us to develop substantially more clever grad students to do so, or, - the state-of-the-art in artificial kitten recognition.

It’s a frothy (some might say foamy-mouthed) research bubble right now, with such cute extrema as, e.g. Inceptionising inceptionism (ADGH16) which learns to learn neural networks using neural networks. Stay tuned for more of this.

There is not very much to do with “neurons” left in the paradigm at this stage. What there is, is a bundle of clever tricks for training deep constrained hierarchical regressions and classified on modern computer hardware.

Some network methods hew closer to behaviour of real neurons: artificial neural networks. (although not very close; simulating actual brains is a different discipline with only intermittent and indirect connection.)

Subtopics of interest to me:

- recurrent networks for audio data
- compressing deep networks
- neural stack machines
- probabilistic learning machines

## Why bother?

There are many answers here.

A classic —-

### The ultimate regression algorithm

…until the next ultimate regression algorithm.

It turns out that this particular learning model (class of learning models) and training technologies is surprisingly good at getting every better models out of ever more data. Why burn three grad students on a perfect tractable and specific regression algorithm when you can use one algorithm to solve a whole bunch of regression problems, and which improves with the number of computers and the amount of data you have? How much of a relief is it to capital to decouple its effectiveness from the uncertainty and obstreperousness of human labour?

### Cool maths

Function approximations, interesting manifold inference. Weird product measure things, e.g. Mont14.

Even the stuff I’d assumed was trivial, like backpropagation, has a few wrinkles in practice. See Michael Nielson’s chapter and Chrisopher Olah’s visual summary.

Yes, this is a regular paper mill. Not only are there probably new insights to be had here, but also you can recycle any old machine learning insight, replace a layer in a network with that and *poof* —- new paper.

### Insight into the mind

TBD. Maybe.

There claims to be communication between real neurology and neural networks in computer vision, but elsewhere neural networks are driven by their similarities to other things, such as being differentiable relaxations of traditional models, such stack machines or of being license to fit hierarchical models without paying attention to statistical niceties.

There might be some kind of occasional “stylised fact”-type relationship here.

### Trippy art projects

## Hip keywords for NN models

Not necessarily mutually exclusive; some design patterns you can use.

See Tomasz Malisiewicz’s summary of Deep Learning Trends @ ICLR 2016, or the Neural network zoo or Simon Brugman’s seep learning papers.

Some of these are descriptions of topologies, others of training tricks or whatever. Recurrent and convolutional are two types of topologies you might have in your ANN. But there are so many other possible ones: “Grid”, “highway”, “Turing” others…

Many are mentioned in passing in David Mcallester’s Cognitive Architectures post.

### Probabilistic/variational

### Convolutional

Signal processing baked in to neural networks. Not so complicated if you have ever done signal processing, apart from the abstruse use of “depth” to mean 2 different things in the literature.

Generally uses FIR filters plus some smudgy “pooling” (which is nonlinear downsampling), although IIR is also making an appearance by running RNN on multiple axes.

### Adversarial

Train two networks to beat each other. I have some intuitions why this might work, but need to learn more. C&C Student-teacher networks.

### Spike-based

Most simulated neural networks are based on a continuous activation potential and discrete time, unlike spiking biological ones, which are driven by discrete events in continuous time. There are a great many other differences. What difference does this in particular make? I suspect it make a difference regarding time.

### Recurrent neural networks

Feedback neural networks structures to have with memory and a notion of time and “current” versus “past” state. see recurrent neural networks.

### GridRNN

A mini-genre. KaDG15 et al connect recurrent cells across multiple axes, leading to a higher-rank MIMO system; This is natural in many kinds of spatial random fields, and I am amazed it was uncommon enough to need formalizing in a paper; but it was and it did and good on Kalchbrenner et al.

### Attention mechanism

What’s that now?

### Kernel networks

Kernel trick + ANN = kernel ANNs. Sounds intuitive; I’m sure there are many hairy details.

### Convex neural networks

Do not confuse with convolutional neural networks.

Bengio, Le Roux, Vincent, Delalleau, and Marcotte, 2006.

### Cortical learning algorithms

Is this a real thing, or pure hype? How does it distinguish itself from other deep learning techniques aside from name-checking biomimetic engineering? NuPIC has made a big splash with their open source brain-esque learning, and have open-sourced it; on that basis alone looks like it could be fun to explore.

- NuPIC is an open source entrant in the field
- How it works
- More How it works

### Extreme learning machines

Dunno.

### Autoencoding

TBD. Making a sparse encoding of something by demanding your network reproduces the after passing the network activations through a narrow bottleneck. Many flavours. I can’t help but wonder how you avoid pathological local minima for this one, but maybe it should be wondering that for all neural networks.

## Optimisation methods

Backpropagation plus gradient descent rules at the moment. Question- does anything else get performance at this scale?
What other techniques can be extracted from variational inference
or MC sampling, or particle filters,
since there is no clear reason that shoving any of these in
as intermediate layers in the network
is any *less* well-posed than a classical backprop layer,
although it does require more nous from the enthusiastic grad student.
Anyway, see online optimisation, which mostly concerns NN-style online optimisation.

## Preventing overfitting

## Encoding for neural networks

Neural networks take an inconvenient encoding format, so general data has to be massaged. Convolutional models are an important implicit encoding; what else can we squeeze [in there/out of there]?

- Radial basis functions
- probabilities
- Mercer kernel (GlLi16)

## Activations for neural networks

## Software stuff

I use Tensorflow, plus a side order of Keras.

R/MATLAB/Python/other?: MXNET.

Lua: Torch

MATLAB/Python: Caffe claims to be a “de facto standard”

Python/C++: Paddlepaddle is Baidu’s nonfancy NN machine

Minimalist C++: tiny-dnn is a C++11 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and IoT devices.

NNpack “is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs.”

NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives to be leveraged by higher-level frameworks

USP: compiles to javacscript amongst other things.

Python: Theano

- Tastes better with Lasagne
- which in turn likes nolearn

- …Or this minute’s flavour, keras. Keras has a common standard for transporting trained neural networks between machines and sometimes architectures,nd is officially supported by Tensorflow
- cxxnet <https://github.com/dmlc/cxxnet>— and mshadow: numpy interface, multiple GPU targets.

- Tastes better with Lasagne
Python/C++: tensorflow is similar to Theano, but younger. However, it’s backed by google so maybe has better long-term prospects? The construction of graphs is more explicit than in Theano, which I find easier to understand, although this means that you lose the elegant syntax of Theano.

I’m using this enough to give it its own notebook.

javascript: see javascript machine learning

iphone: DeepBeliefSDK

### Partial-training

recycling someone else’s features.

Building powerful image classification models using very little data

In this tutorial, we will present a few simple yet effective methods that you can use to build a powerful image classifier, using only very few training examples —just a few hundred or thousand pictures from each class you want to be able to recognize.

We will go over the following options:

training a small network from scratch (as a baseline) using the bottleneck features of a pre-trained network fine-tuning the top layers of a pre-trained network

Our setup: only 2000 training examples (1000 per class)

## Examples

### data

### pre-computed/trained models

- Caffe format:
- The Caffe Zoo has lots of nice models, pre-trained on their wiki
- Here’s a great CV one, Andrej Karpathy’s image captioner, Neuraltalk2

- for the NVC dataset: - pre-trained feature model here)
- Alexnet
- For lasagne: https://github.com/Lasagne/Recipes/tree/master/modelzoo
- For Keras:

## Latent variables, husbandry

Projector visualises embeddings:

TensorBoard has a built-in visualizer, called the Embedding Projector, for interactive visualization and analysis of high-dimensional data like embeddings. It is meant to be useful for developers and researchers alike. It reads from the checkpoint files where you save your tensorflow variables. Although it’s most useful for embeddings, it will load any 2D tensor, potentially including your training weights.

## Howtos

- Awesome deep learning
- What’s wrong with deep learning? is a high speed diagrammatic introductory presentation with clickbait title, by one of the founding fathers, Yann LeCunn
- Yarin Gal on uncertainty quantification
- Memkite’s Deep learning bibliography
- deeplearning.net’s reading list…
- and their tutorials are clear

- Michael Nielson has a free online textbook with code examples in python
- Dürr’s tutorial
- Geoffrey Hinton’s video draws the connection between Markov Random Fields and neural networks, and also links to lots of other video tutorials in the sidebar
- The cat recogniser team lead, Quoc Le, has some nice lectures
- cute: srirajology’s energetic “demystifying” howtos

## To read

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Jeff Dean’s Large Scale Deep Learning at Google

The vector embedding is cool:

\begin{equation*} E(Rome) - E(Italy) + E(Germany) \approx E(Berlin) \end{equation*}More of that under semantics.

## Refs

- Amar98
- Amari, S. (1998) Natural Gradient Works Efficiently in Learning.
*Neural Computation*, 10(2), 251–276. DOI. - ADGH16
- Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016) Learning to learn by gradient descent by gradient descent.
*arXiv:1606.04474 [cs]*. - ArRK10
- Arel, I., Rose, D. C., & Karnowski, T. P.(2010) Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier].
*IEEE Computational Intelligence Magazine*, 5(4), 13–18. DOI. - AGMM15
- Arora, S., Ge, R., Ma, T., & Moitra, A. (2015) Simple, Efficient, and Neural Algorithms for Sparse Coding.
*arXiv:1503.00778 [cs, Stat]*. - Bach14
- Bach, F. (2014) Breaking the Curse of Dimensionality with Convex Neural Networks.
*arXiv:1412.8690 [cs, Math, Stat]*. - Barr93
- Barron, A. R.(1993) Universal approximation bounds for superpositions of a sigmoidal function.
*IEEE Transactions on Information Theory*, 39(3), 930–945. DOI. - BLPB12
- Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., … Bengio, Y. (2012) Theano: new features and speed improvements.
*arXiv:1211.5590 [cs]*. - Beng09
- Bengio, Y. (2009) Learning deep architectures for AI.
*Foundations and Trends® in Machine Learning*, 2(1), 1–127. DOI. - BeCV13
- Bengio, Y., Courville, A., & Vincent, P. (2013) Representation Learning: A Review and New Perspectives.
*IEEE Trans. Pattern Anal. Machine Intell.*, 35, 1798–1828. DOI. - BeLe07
- Bengio, Y., & LeCun, Y. (2007) Scaling learning algorithms towards AI.
*Large-Scale Kernel Machines*, 34, 1–41. - BRVD05
- Bengio, Y., Roux, N. L., Vincent, P., Delalleau, O., & Marcotte, P. (2005) Convex neural networks. In Advances in neural information processing systems (pp. 123–130).
- Bose91
- Boser, B. (1991) An analog neural network processor with programmable topology.
*J. Solid State Circuits*, 26, 2017–2025. DOI. - Cadi14
- Cadieu, C. F.(2014) Deep neural networks rival the representation of primate it cortex for core visual object recognition.
*PLoS Comp. Biol.*, 10, e1003963. DOI. - ChGS15
- Chen, T., Goodfellow, I., & Shlens, J. (2015) Net2Net: Accelerating Learning via Knowledge Transfer.
*arXiv:1511.05641 [cs]*. - CWTW15
- Chen, W., Wilson, J. T., Tyree, S., Weinberger, K. Q., & Chen, Y. (2015) Compressing Neural Networks with the Hashing Trick.
*arXiv:1504.04788 [cs]*. - CMBB14
- Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014) On the properties of neural machine translation: Encoder-decoder approaches.
*arXiv Preprint arXiv:1409.1259*. - CHMB15
- Choromanska, A., Henaff, Mi., Mathieu, M., Ben Arous, G., & LeCun, Y. (2015) The Loss Surfaces of Multilayer Networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (pp. 192–204).
- Ciod12
- Ciodaro, T. (2012) Online particle detection with neural networks based on topological calorimetry information.
*J. Phys. Conf. Series*, 368, 012030. DOI. - Cire12
- Ciresan, D. (2012) Multi-column deep neural network for traffic sign classification.
*Neural Networks*, 32, 333–338. DOI. - Dahl12
- Dahl, G. E.(2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition.
*IEEE Trans. Audio Speech Lang. Process.*, 20, 33–42. DOI. - Dett15
- Dettmers, T. (2015) 8-Bit Approximations for Parallelism in Deep Learning.
*arXiv:1511.04561 [cs]*. - DiSc14
- Dieleman, S., & Schrauwen, B. (2014) End-to-end learning for music audio. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6964–6968). IEEE DOI.
- DoSB14
- Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2014) Learning to Generate Chairs with Convolutional Neural Networks.
*arXiv:1411.5928 [cs]*. - DuPW14
- Duan, Q., Park, J. H., & Wu, Z.-G. (2014) Exponential state estimator design for discrete-time neural networks with discrete and distributed time-varying delays.
*Complexity*, 20(1), 38–48. DOI. - Fara13
- Farabet, C. (2013) Learning hierarchical features for scene labeling.
*IEEE Trans. Pattern Anal. Mach. Intell.*, 35, 1915–1929. DOI. - Fuku82
- Fukushima, K. (1982) Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position.
*Pattern Recognition*, 15, 455–469. DOI. - Gal15
- Gal, Y. (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
*arXiv:1512.05287 [stat]*. - Garc04
- Garcia, C. (2004) Convolutional face finder: a neural architecture for fast and robust face detection.
*IEEE Trans. Pattern Anal. Machine Intell.*, 26, 1408–1423. DOI. - GaEB15
- Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style.
*arXiv:1508.06576 [cs, Q-Bio]*. - GiSB14
- Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks.
*arXiv:1412.5896 [cs, Math, Stat]*. - GiSB16
- Giryes, R., Sapiro, G., & Bronstein, A. M.(2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?.
*IEEE Transactions on Signal Processing*, 64(13), 3444–3457. DOI. - GlLi16
- Globerson, A., & Livni, R. (2016) Learning Infinite-Layer Networks: Beyond the Kernel Trick.
*arXiv:1606.05316 [cs]*. - GPMX14
- Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014) Generative Adversarial Networks.
*arXiv:1406.2661 [cs, Stat]*. - GoSS14
- Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014) Explaining and Harnessing Adversarial Examples.
*arXiv:1412.6572 [cs, Stat]*. - Hads09
- Hadsell, R. (2009) Learning long-range vision for autonomous off-road driving.
*J. Field Robot.*, 26, 120–144. DOI. - HaCL06
- Hadsell, R., Chopra, S., & LeCun, Y. (2006) Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
- HaMD15
- Han, S., Mao, H., & Dally, W. J.(2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.
*arXiv:1510.00149 [cs]*. - HeWH16
- He, K., Wang, Y., & Hopcroft, J. (2016) A Powerful Generative Model Using Random Weights for the Deep Image Representation.
*arXiv:1606.04801 [cs]*. - Helm13
- Helmstaedter, M. (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina.
*Nature*, 500, 168–174. DOI. - Hint10
- Hinton, G. (2010) A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade (Vol. 9, p. 926). Springer Berlin Heidelberg
- HDYD12
- Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.
*IEEE Signal Processing Magazine*, 29(6), 82–97. DOI. - Hint95
- Hinton, G. E.(1995) The wake-sleep algorithm for unsupervised neural networks.
*Science*, 268(5214), 1558–1161. DOI. - Hint07
- Hinton, G. E.(2007) To recognize shapes, first learn to generate images. In T. D. and J. F. K. Paul Cisek (Ed.), Progress in Brain Research (Vol. Volume 165, pp. 535–547). Elsevier
- HiSa06
- Hinton, G. E., & Salakhutdinov, R. R.(2006) Reducing the dimensionality of data with neural networks.
*Science*, 313(5786), 504–507. DOI. - HiOT06
- Hinton, G., Osindero, S., & Teh, Y. (2006) A Fast Learning Algorithm for Deep Belief Nets.
*Neural Computation*, 18(7), 1527–1554. DOI. - HuSi05
- Huang, G.-B., & Siew, C.-K. (2005) Extreme learning machine with randomly assigned RBF kernels.
*International Journal of Information Technology*, 11(1), 16–24. - HuWL11
- Huang, G.-B., Wang, D. H., & Lan, Y. (2011) Extreme learning machines: a survey.
*International Journal of Machine Learning and Cybernetics*, 2(2), 107–122. DOI. - HuZS04
- Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 985–990 vol.2). DOI.
- HuZS06
- Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006) Extreme learning machine: Theory and applications.
*Neurocomputing*, 70(1–3), 489–501. DOI. - Hube62
- Hubel, D. H.(1962) Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex.
*J. Physiol.*, 160, 106–154. DOI. - HuPC15
- Hu, T., Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.
*arXiv:1503.00690 [cs, Q-Bio, Stat]*. - KaSu15
- Kaiser, Ł., & Sutskever, I. (2015) Neural GPUs Learn Algorithms.
*arXiv:1511.08228 [cs]*. - KaDG15
- Kalchbrenner, N., Danihelka, I., & Graves, A. (2015) Grid Long Short-Term Memory.
*arXiv:1507.01526 [cs]*. - KaRL10
- Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2010) Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition.
*arXiv:1010.3467 [cs]*. - KWKT15
- Kulkarni, T. D., Whitney, W., Kohli, P., & Tenenbaum, J. B.(2015) Deep Convolutional Inverse Graphics Network.
*arXiv:1503.03167 [cs]*. - LSLW15
- Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric.
*arXiv:1512.09300 [cs, Stat]*. - Lawr97
- Lawrence, S. (1997) Face recognition: a convolutional neural-network approach.
*IEEE Trans. Neural Networks*, 8, 98–113. DOI. - Lecu98
- LeCun, Y. (1998) Gradient-based learning applied to document recognition.
*Proc. IEEE*, 86(11), 2278–2324. DOI. - LeBH15
- LeCun, Y., Bengio, Y., & Hinton, G. (2015) Deep learning.
*Nature*, 521(7553), 436–444. DOI. - LCHR06
- LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006) A tutorial on energy-based learning.
*Predicting Structured Data*. - LGRN00
- Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y.(n.d.) Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. . Presented at the Proceedings of the 26th International Confer- ence on Machine Learning, 2009
- Leun14
- Leung, M. K.(2014) Deep learning of the tissue-regulated splicing code.
*Bioinformatics*, 30, i121–i129. DOI. - LCMB15
- Lin, Z., Courbariaux, M., Memisevic, R., & Bengio, Y. (2015) Neural Networks with Few Multiplications.
*arXiv:1510.03009 [cs]*. - Lipt16
- Lipton, Z. C.(2016) The Mythos of Model Interpretability. In arXiv:1606.03490 [cs, stat].
- Ma15
- Ma, J. (2015) Deep neural nets as a method for quantitative structure-activity relationships.
*J. Chem. Inf. Model.*, 55, 263–274. DOI. - Mall12
- Mallat, S. (2012) Group Invariant Scattering.
*Communications on Pure and Applied Mathematics*, 65(10), 1331–1398. DOI. - Mall16
- Mallat, S. (2016) Understanding Deep Convolutional Networks.
*arXiv:1601.04920 [cs, Stat]*. - MAPE15
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, … Xiaoqiang Zheng. (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.
- MCCD13
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space.
*arXiv:1301.3781 [cs]*. - MiLS13
- Mikolov, T., Le, Q. V., & Sutskever, I. (2013) Exploiting Similarities among Languages for Machine Translation.
*arXiv:1309.4168 [cs]*. - Mnih15
- Mnih, V. (2015) Human-level control through deep reinforcement learning.
*Nature*, 518, 529–533. DOI. - MoDH12
- Mohamed, A. r, Dahl, G. E., & Hinton, G. (2012) Acoustic Modeling Using Deep Belief Networks.
*IEEE Transactions on Audio, Speech, and Language Processing*, 20(1), 14–22. DOI. - MoRe12
- Monner, D., & Reggia, J. A.(2012) A generalized LSTM-like training algorithm for second-order recurrent neural networks.
*Neural Networks*, 25, 70–83. DOI. - Mont14
- Montufar, G. (2014) When does a mixture of products contain a product of mixtures?.
*J. Discrete Math.*, 29, 321–347. DOI. - Ning05
- Ning, F. (2005) Toward automatic phenotyping of developing embryos from videos.
*IEEE Trans. Image Process.*, 14, 1360–1371. DOI. - OlFi96a
- Olshausen, B. A., & Field, D. J.(1996a) Emergence of simple-cell receptive field properties by learning a sparse code for natural images.
*Nature*, 381(6583), 607–609. DOI. - OlFi96b
- Olshausen, B. A., & Field, D. J.(1996b) Natural image statistics and efficient coding.
*Network (Bristol, England)*, 7(2), 333–339. DOI. - OlFi04
- Olshausen, B. A., & Field, D. J.(2004) Sparse coding of sensory inputs.
*Current Opinion in Neurobiology*, 14(4), 481–487. DOI. - OIMT15
- Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds.
*arXiv:1512.08512 [cs]*. - PaDG16
- Pan, W., Dong, H., & Guo, Y. (2016) DropNeuron: Simplifying the Structure of Deep Neural Networks.
*arXiv:1606.07326 [cs, Stat]*. - PaVe14
- Paul, A., & Venkatasubramanian, S. (2014) Why does Deep Learning work? - A perspective from Group Theory.
*arXiv:1412.6621 [cs, Stat]*. - PeCh15
- Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network Derived from Online Non-Negative Matrix Factorization Can Cluster and Discover Sparse Features.
*arXiv:1503.00680 [cs, Q-Bio, Stat]*. - RaMC15
- Radford, A., Metz, L., & Chintala, S. (2015) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.
*arXiv:1511.06434 [cs]*. - RaBC08
- Ranzato, M. aurelio, Boureau, Y. -la., & Cun, Y. L.(2008) Sparse Feature Learning for Deep Belief Networks. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in Neural Information Processing Systems 20 (pp. 1185–1192). Curran Associates, Inc.
- Ranz13
- Ranzato, M. (2013) Modeling natural images using gated MRFs.
*IEEE Trans. Pattern Anal. Machine Intell.*, 35(9), 2206–2222. DOI. - RBKC14
- Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014) FitNets: Hints for Thin Deep Nets.
*arXiv:1412.6550 [cs]*. - Rume86
- Rumelhart, D. E.(1986) Learning representations by back-propagating errors.
*Nature*, 323, 533–536. DOI. - SGAL14
- Sagun, L., Guney, V. U., Arous, G. B., & LeCun, Y. (2014) Explorations on high dimensional landscapes.
*arXiv:1412.6615 [cs, Stat]*. - SCHU16
- Scardapane, S., Comminiello, D., Hussain, A., & Uncini, A. (2016) Group Sparse Regularization for Deep Neural Networks.
*arXiv:1607.00485 [cs, Stat]*. - Schw07
- Schwenk, H. (2007) Continuous space language models.
*Computer Speech Lang.*, 21, 492–518. DOI. - SDBR14
- Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net.
*arXiv:1412.6806 [cs]*. - StGa15
- Steeg, G. V., & Galstyan, A. (2015) The Information Sieve.
*arXiv:1507.02284 [cs, Math, Stat]*. - Telg15
- Telgarsky, M. (2015) Representation Benefits of Deep Feedforward Networks.
*arXiv:1509.08101 [cs]*. - Tura10
- Turaga, S. C.(2010) Convolutional networks can learn to generate affinity graphs for image segmentation.
*Neural Comput.*, 22, 511–538. DOI. - UGKA16
- Urban, G., Geras, K. J., Kahou, S. E., Aslan, O., Wang, S., Caruana, R., … Richardson, M. (2016) Do Deep Convolutional Nets Really Need to be Deep (Or Even Convolutional)?.
*arXiv:1603.05691 [cs, Stat]*. - Waib89
- Waibel, A. (1989) Phoneme recognition using time-delay neural networks.
*IEEE Trans. Acoustics Speech Signal Process.*, 37(3), 328–339. DOI. - WiBö15
- Wiatowski, T., & Bölcskei, H. (2015) A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction.
*arXiv:1512.06293 [cs, Math, Stat]*. - ZhCL14
- Zhang, S., Choromanska, A., & LeCun, Y. (2014) Deep learning with Elastic Averaging SGD.
*arXiv:1412.6651 [cs, Stat]*.