The Living Thing / Notebooks :

Deep learning

can haz more layerz

Bjorn Stenger’s brief history of machine learning.

Contents

Modern computational neural network methods reascend the hype phase transition. a.k.a deep learning or extreme learning or double plus fancy brainbots or please can our department have a bigger GPU budget it’s not to play video games I swear?.

What?

To be specific, deep learning is

It’s a frothy (some might say foamy-mouthed) research bubble right now, with such cute extrema as, e.g. Inceptionising inceptionism (ADGH16) which learns to learn neural networks using neural networks. Stay tuned for more of this.

There is not much to do with “neurons” left in the paradigm at this stage. What there is, is a bundle of clever tricks for training deep constrained hierarchical regressions and classified on modern computer hardware. Something closer to a convenient technology stack than a single “theory”.

Some network methods hew closer to behaviour of real neurons, although not that close; simulating actual brains is a different discipline with only intermittent and indirect connection.

Subtopics of interest to me:

Why bother?

There are many answers here.

A classic —-

The ultimate regression algorithm

…until the next ultimate regression algorithm.

It turns out that this particular learning model (class of learning models) and training technologies is surprisingly good at getting every better models out of ever more data. Why burn three grad students on a perfect tractable and specific regression algorithm when you can use one algorithm to solve a whole bunch of regression problems, and which improves with the number of computers and the amount of data you have? How much of a relief is it to capital to decouple its effectiveness from the uncertainty and obstreperousness of human labour?

Cool maths

Function approximations, interesting manifold inference. Weird product measure things, e.g. Mont14.

Even the stuff I’d assumed was trivial, like backpropagation, has a few wrinkles in practice. See Michael Nielson’s chapter and Chrisopher Olah’s visual summary.

Yes, this is a regular paper mill. Not only are there probably new insights to be had here, but also you can recycle any old machine learning insight, replace a layer in a network with that and poof —- new paper.

Insight into the mind

TBD. Maybe.

There claims to be communication between real neurology and neural networks in computer vision, but elsewhere neural networks are driven by their similarities to other things, such as being differentiable relaxations of traditional models, such stack machines or of being license to fit hierarchical models without paying attention to statistical niceties.

There might be some kind of occasional “stylised fact”-type relationship here.

Hip keywords for NN models

Not necessarily mutually exclusive; some design patterns you can use.

See Tomasz Malisiewicz’s summary of Deep Learning Trends @ ICLR 2016, or the Neural network zoo or Simon Brugman’s seep learning papers.

Some of these are descriptions of topologies, others of training tricks or whatever. Recurrent and convolutional are two types of topologies you might have in your ANN. But there are so many other possible ones: “Grid”, “highway”, “Turing” others…

Many are mentioned in passing in David Mcallester’s Cognitive Architectures post.

Convolutional

Signal processing baked in to neural networks. Not so complicated if you have ever done signal processing, apart from the abstruse use of “depth” to mean 2 different things in the literature.

Generally uses FIR filters plus some smudgy “pooling” (which is nonlinear downsampling), although IIR is also making an appearance by running RNN on multiple axes.

Terence broad go you

Recurrent neural networks

Feedback neural networks structures to have with memory and a notion of time and “current” versus “past” state. see recurrent neural networks.

GridRNN etc

A mini-genre. KaDG15 et al connect recurrent cells across multiple axes, leading to a higher-rank MIMO system; This is natural in many kinds of spatial random fields, and I am amazed it was uncommon enough to need formalizing in a paper; but it was and it did and good on Kalchbrenner et al.

Attention mechanism

What’s that now?

Spike-based

Most simulated neural networks are based on a continuous activation potential and discrete time, unlike spiking biological ones, which are driven by discrete events in continuous time. There are a great many other differences. What difference does this in particular make? I suspect it make a difference regarding time. See undifferentiable neural networks.

Kernel networks

Kernel trick + ANN = kernel ANNs. Sounds intuitive; I’m sure there are many hairy details.

Convex neural networks

Something to do with kernel tricks?? Do not confuse with convolutional neural networks.

Bengio, Le Roux, Vincent, Delalleau, and Marcotte, 2006.

I'm sure the brain totes does this

Extreme learning machines

Dunno. I think this is a flavour of random neural net?

Autoencoding

TBD. Making a sparse encoding of something by demanding your network reproduces the after passing the network activations through a narrow bottleneck. Many flavours. I can’t help but wonder how you avoid pathological local minima for this one, but maybe it should be wondering that for all neural networks.

Optimisation methods

Backpropagation plus stochastic gradient descent rules at the moment.

Does anything else get performance at this scale? What other techniques can be extracted from variational inference or MC sampling, or particle filters, since there is no clear reason that shoving any of these in as intermediate layers in the network is any less well-posed than a classical backprop layer? Although it does require more nous from the enthusiastic grad student.

Preventing overfitting

See regularising deep learning.

Encoding for neural networks

Neural networks take an inconvenient encoding format, so general data has to be massaged. Convolutional models are an important implicit encoding; what else can we squeeze [in there/out of there]?

Activations for neural networks

See activation functions

Software stuff

I use Tensorflow, plus a side order of Keras.

Partial-training

A.k.a. transfer learning. Recycling someone else’s features. I don’t know why this has a special term - I think it’s so that you can claim to do “end-to-end” learning, but then actually do what everyone else as done forever and works totally OK, which is to re-use other people’s work like real scientists.

Building powerful image classification models using very little data.

Examples

pre-computed/trained models

Howtos

To read

Refs

Amar98
Amari, S. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276. DOI.
ADGH16
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016) Learning to learn by gradient descent by gradient descent. arXiv:1606.04474 [Cs].
ArRK10
Arel, I., Rose, D. C., & Karnowski, T. P.(2010) Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]. IEEE Computational Intelligence Magazine, 5(4), 13–18. DOI.
AGMM15
Arora, S., Ge, R., Ma, T., & Moitra, A. (2015) Simple, Efficient, and Neural Algorithms for Sparse Coding. arXiv:1503.00778.
Bach14
Bach, F. (2014) Breaking the Curse of Dimensionality with Convex Neural Networks. arXiv:1412.8690 [Cs, Math, Stat].
Barr93
Barron, A. R.(1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945. DOI.
BLPB12
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., … Bengio, Y. (2012) Theano: new features and speed improvements. arXiv:1211.5590 [Cs].
BaPS16
Baydin, A. G., Pearlmutter, B. A., & Siskind, J. M.(2016) Tricks from Deep Learning. arXiv:1611.03777 [Cs, Stat].
Beng09
Bengio, Y. (2009) Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. DOI.
BeCV13
Bengio, Y., Courville, A., & Vincent, P. (2013) Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Machine Intell., 35, 1798–1828. DOI.
BeLe07
Bengio, Y., & LeCun, Y. (2007) Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 34, 1–41.
BRVD05
Bengio, Y., Roux, N. L., Vincent, P., Delalleau, O., & Marcotte, P. (2005) Convex neural networks. In Advances in neural information processing systems (pp. 123–130).
Bose91
Boser, B. (1991) An analog neural network processor with programmable topology. J. Solid State Circuits, 26, 2017–2025. DOI.
Cadi14
Cadieu, C. F.(2014) Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comp. Biol., 10, e1003963. DOI.
ChGS15
Chen, T., Goodfellow, I., & Shlens, J. (2015) Net2Net: Accelerating Learning via Knowledge Transfer. arXiv:1511.05641 [Cs].
CWTW15
Chen, W., Wilson, J. T., Tyree, S., Weinberger, K. Q., & Chen, Y. (2015) Compressing Neural Networks with the Hashing Trick. arXiv:1504.04788 [Cs].
CMBB14
Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv Preprint arXiv:1409.1259.
CHMB15
Choromanska, A., Henaff, Mi., Mathieu, M., Ben Arous, G., & LeCun, Y. (2015) The Loss Surfaces of Multilayer Networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (pp. 192–204).
Ciod12
Ciodaro, T. (2012) Online particle detection with neural networks based on topological calorimetry information. J. Phys. Conf. Series, 368, 012030. DOI.
Cire12
Ciresan, D. (2012) Multi-column deep neural network for traffic sign classification. Neural Networks, 32, 333–338. DOI.
CBMF16
Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2016) Practical Learning of Deep Gaussian Processes via Random Fourier Features. arXiv:1610.04386 [Stat].
Dahl12
Dahl, G. E.(2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process., 20, 33–42. DOI.
Dett15
Dettmers, T. (2015) 8-Bit Approximations for Parallelism in Deep Learning. arXiv:1511.04561 [Cs].
DiSc14
Dieleman, S., & Schrauwen, B. (2014) End-to-end learning for music audio. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6964–6968). IEEE DOI.
Fara13
Farabet, C. (2013) Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell., 35, 1915–1929. DOI.
FuMi82
Fukushima, K., & Miyake, S. (1982) Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15(6), 455–469. DOI.
Gal15
Gal, Y. (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [Stat].
Garc04
Garcia, C. (2004) Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Machine Intell., 26, 1408–1423. DOI.
GaEB15
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. arXiv:1508.06576 [Cs, Q-Bio].
GiSB16
Giryes, R., Sapiro, G., & Bronstein, A. M.(2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?. IEEE Transactions on Signal Processing, 64(13), 3444–3457. DOI.
GiSB14
Giryes, Raja, Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [Cs, Math, Stat].
GlLi16
Globerson, A., & Livni, R. (2016) Learning Infinite-Layer Networks: Beyond the Kernel Trick. arXiv:1606.05316 [Cs].
GPMX14
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014) Generative Adversarial Networks. arXiv:1406.2661 [Cs, Stat].
GoSS14
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014) Explaining and Harnessing Adversarial Examples. arXiv:1412.6572 [Cs, Stat].
HaCL06
Hadsell, R., Chopra, S., & LeCun, Y. (2006) Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
HaMD15
Han, S., Mao, H., & Dally, W. J.(2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149 [Cs].
HeWH16
He, K., Wang, Y., & Hopcroft, J. (2016) A Powerful Generative Model Using Random Weights for the Deep Image Representation. arXiv:1606.04801 [Cs].
Helm13
Helmstaedter, M. (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature, 500, 168–174. DOI.
HDYD12
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
Hint95
Hinton, G. E.(1995) The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214), 1558–1161. DOI.
HiOT06
Hinton, G, Osindero, S., & Teh, Y. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. DOI.
Hint10
Hinton, Geoffrey. (2010) A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade (Vol. 9, p. 926). Springer Berlin Heidelberg
Hint07
Hinton, Geoffrey E. (2007) To recognize shapes, first learn to generate images. In T. D. and J. F. K. Paul Cisek (Ed.), Progress in Brain Research (Vol. Volume 165, pp. 535–547). Elsevier
HiSa06
Hinton, Geoffrey E., & Salakhutdinov, R. R.(2006) Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. DOI.
HuSi05
Huang, G.-B., & Siew, C.-K. (2005) Extreme learning machine with randomly assigned RBF kernels. International Journal of Information Technology, 11(1), 16–24.
HuWL11
Huang, G.-B., Wang, D. H., & Lan, Y. (2011) Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107–122. DOI.
HuZS04
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 985–990 vol.2). DOI.
HuZS06
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006) Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. DOI.
Hube62
Hubel, D. H.(1962) Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol., 160, 106–154. DOI.
IaYA16
Ian Goodfellow, Yoshua Bengio, & Aaron Courville. (2016) Deep Learning.
JCOV16
Jaderberg, M., Czarnecki, W. M., Osindero, S., Vinyals, O., Graves, A., & Kavukcuoglu, K. (2016) Decoupled Neural Interfaces using Synthetic Gradients. arXiv:1608.05343 [Cs].
KaSu15
Kaiser, Ł., & Sutskever, I. (2015) Neural GPUs Learn Algorithms. arXiv:1511.08228 [Cs].
KaDG15
Kalchbrenner, N., Danihelka, I., & Graves, A. (2015) Grid Long Short-Term Memory. arXiv:1507.01526 [Cs].
KaRL10
Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2010) Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition. arXiv:1010.3467 [Cs].
KSJC16
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. arXiv:1606.04934 [Cs, Stat].
KWKT15
Kulkarni, T. D., Whitney, W., Kohli, P., & Tenenbaum, J. B.(2015) Deep Convolutional Inverse Graphics Network. arXiv:1503.03167 [Cs].
LSLW15
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv:1512.09300 [Cs, Stat].
Lawr97
Lawrence, S. (1997) Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks, 8, 98–113. DOI.
Lecu98
LeCun, Y. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. DOI.
LeBH15
LeCun, Yann, Bengio, Y., & Hinton, G. (2015) Deep learning. Nature, 521(7553), 436–444. DOI.
LCHR06
LeCun, Yann, Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006) A tutorial on energy-based learning. Predicting Structured Data.
LGRN00
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y.(n.d.) Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. . Presented at the Proceedings of the 26th International Confer- ence on Machine Learning, 2009
Leun14
Leung, M. K.(2014) Deep learning of the tissue-regulated splicing code. Bioinformatics, 30, i121–i129. DOI.
LCMB15
Lin, Z., Courbariaux, M., Memisevic, R., & Bengio, Y. (2015) Neural Networks with Few Multiplications. arXiv:1510.03009 [Cs].
Lipt16
Lipton, Z. C.(2016) The Mythos of Model Interpretability. In arXiv:1606.03490 [cs, stat].
Ma15
Ma, J. (2015) Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model., 55, 263–274. DOI.
MaDA15
Maclaurin, D., Duvenaud, D. K., & Adams, R. P.(2015) Gradient-based Hyperparameter Optimization through Reversible Learning. In ICML (pp. 2113–2122).
Mall12
Mallat, S. (2012) Group Invariant Scattering. Communications on Pure and Applied Mathematics, 65(10), 1331–1398. DOI.
Mall16
Mallat, S. (2016) Understanding Deep Convolutional Networks. arXiv:1601.04920 [Cs, Stat].
MAPE15
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, … Xiaoqiang Zheng. (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.
MCCD13
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [Cs].
MiLS13
Mikolov, T., Le, Q. V., & Sutskever, I. (2013) Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168 [Cs].
Mnih15
Mnih, V. (2015) Human-level control through deep reinforcement learning. Nature, 518, 529–533. DOI.
MoDH12
Mohamed, A. r, Dahl, G. E., & Hinton, G. (2012) Acoustic Modeling Using Deep Belief Networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22. DOI.
MoRe12
Monner, D., & Reggia, J. A.(2012) A generalized LSTM-like training algorithm for second-order recurrent neural networks. Neural Networks, 25, 70–83. DOI.
Mont14
Montufar, G. (2014) When does a mixture of products contain a product of mixtures?. J. Discrete Math., 29, 321–347. DOI.
MoBa17
Mousavi, A., & Baraniuk, R. G.(2017) Learning to Invert: Signal Recovery via Deep Convolutional Networks. In ICASSP.
Ning05
Ning, F. (2005) Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Process., 14, 1360–1371. DOI.
Nøkl16
Nøkland, A. (2016) Direct Feedback Alignment Provides Learning in Deep Neural Networks. arXiv:1609.01596 [Cs, Stat].
OlFi96
Olshausen, B. A., & Field, D. J.(1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609. DOI.
OlFi04
Olshausen, B. A., & Field, D. J.(2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI.
OIMT15
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. arXiv:1512.08512 [Cs].
PaDG16
Pan, W., Dong, H., & Guo, Y. (2016) DropNeuron: Simplifying the Structure of Deep Neural Networks. arXiv:1606.07326 [Cs, Stat].
PaVe14
Paul, A., & Venkatasubramanian, S. (2014) Why does Deep Learning work? - A perspective from Group Theory. arXiv:1412.6621 [Cs, Stat].
Pink99
Pinkus, A. (1999) Approximation theory of the MLP model in neural networks. Acta Numerica, 8, 143–195. DOI.
RaMC15
Radford, A., Metz, L., & Chintala, S. (2015) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 [Cs].
RaBC08
Ranzato, Marctextquotesingle aurelio, Boureau, Y. -la., & Cun, Y. L.(2008) Sparse Feature Learning for Deep Belief Networks. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in Neural Information Processing Systems 20 (pp. 1185–1192). Curran Associates, Inc.
Ranz13
Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35(9), 2206–2222. DOI.
RBKC14
Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014) FitNets: Hints for Thin Deep Nets. arXiv:1412.6550 [Cs].
Rume86
Rumelhart, D. E.(1986) Learning representations by back-propagating errors. Nature, 323, 533–536. DOI.
SGAL14
Sagun, L., Guney, V. U., Arous, G. B., & LeCun, Y. (2014) Explorations on high dimensional landscapes. arXiv:1412.6615 [Cs, Stat].
SaKi16
Salimans, T., & Kingma, D. P.(2016) Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 901–901). Curran Associates, Inc.
SCHU16
Scardapane, S., Comminiello, D., Hussain, A., & Uncini, A. (2016) Group Sparse Regularization for Deep Neural Networks. arXiv:1607.00485 [Cs, Stat].
SMMD17
Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017) Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538 [Cs, Stat].
SDBR14
Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net. In Proceedings of International Conference on Learning Representations (ICLR) 2015.
StGa15
Steeg, G. V., & Galstyan, A. (2015) The Information Sieve. arXiv:1507.02284 [Cs, Math, Stat].
Telg15
Telgarsky, M. (2015) Representation Benefits of Deep Feedforward Networks. arXiv:1509.08101 [Cs].
Tura10
Turaga, S. C.(2010) Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput., 22, 511–538. DOI.
UGKA16
Urban, G., Geras, K. J., Kahou, S. E., Aslan, O., Wang, S., Caruana, R., … Richardson, M. (2016) Do Deep Convolutional Nets Really Need to be Deep (Or Even Convolutional)?. arXiv:1603.05691 [Cs, Stat].
Oord16
van den Oord, A. (2016) Wavenet: A Generative Model for Raw Audio.
OoKK16
van den Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. (2016) Pixel Recurrent Neural Networks. arXiv:1601.06759 [Cs].
OKVE16
van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., & Kavukcuoglu, K. (2016) Conditional Image Generation with PixelCNN Decoders. arXiv:1606.05328 [Cs].
Waib89
Waibel, A. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37(3), 328–339. DOI.
WiBö15
Wiatowski, T., & Bölcskei, H. (2015) A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. arXiv:1512.06293 [Cs, Math, Stat].
XiLS16
Xie, B., Liang, Y., & Song, L. (2016) Diversity Leads to Generalization in Neural Networks. arXiv:1611.03131 [Cs, Stat].
YuDe11
Yu, D., & Deng, L. (2011) Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP]. IEEE Signal Processing Magazine, 28(1), 145–154. DOI.
ZBHR17
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017) Understanding deep learning requires rethinking generalization. In Proceedings of ICLR.
ZhCL15
Zhang, S., Choromanska, A., & LeCun, Y. (2015) Deep learning with Elastic Averaging SGD. In Advances In Neural Information Processing Systems.