The Living Thing / Notebooks : Random neural networks

Recurrent: Liquid/ Echo State Machines/Random reservoir networks

This sounds deliciously lazy; Very roughly speaking, your first layer is a reservoir of random saturating IIR filters. You fit a classifier on the outputs of this - possibly even allowing the network to converged to a steady state in some sense, so that the oscillations of the reservoir are not coupled to time.

Easy to implement, that. I wonder when it actually works, constraints on topology etc.

I wonder if you can use some kind of sparsifying transform on the recurrence operator?

These claim to be based on spiky models, but AFAICT this is not at all necessary.

Various claims are made about how hard they avoid the training difficulty of similarly basic RNNs by being essentially untrained; you use them as a feature factory for another supervised output algorithm.

Suggestive parallel with random projections. Not strictly recurrent, but same general idea: HeWH16.

LuJa09 has an interesting taxonomy. (although even if the exact details were right at the time, the evolution of neural network research make them questionable now):

From a dynamical systems perspective, there are two main classes of RNNs. Models from the first class are characterized by an energy-minimizing stochastic dynamics and symmetric connections. The best known instantiations are Hopfield networks, Boltzmann machines, and the recently emerging Deep Belief Networks. These networks are mostly trained in some unsupervised learning scheme. Typical targeted network functionalities in this field are associative memories, data compression, the unsupervised modeling of data distributions, and static pattern classification, where the model is run for multiple time steps per single input instance to reach some type of convergence or equilibrium (but see e.g., TaHR06 for extension to temporal data). The mathematical background is rooted in statistical physics. In contrast, the second big class of RNN models typically features a deterministic update dynamics and directed connections. Systems from this class implement nonlinear filters, which transform an input time series into an output time series. The mathematical background here is nonlinear dynamical systems. The standard training mode is supervised.

Random convolutions

TBD.

(Particularly) Random Training

Refs

AuBM08
Auer, P., Burgsteiner, H., & Maass, W. (2008) A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Networks, 21(5), 786–795. DOI.
BaSL16
Baldi, P., Sadowski, P., & Lu, Z. (2016) Learning in the Machine: Random Backpropagation and the Learning Channel. arXiv:1612.02734 [Cs].
CWZW16
Cao, F., Wang, D., Zhu, H., & Wang, Y. (2016) An iterative learning algorithm for feedforward neural networks with random weights. Information Sciences, 328, 546–557. DOI.
ChYR16
Charles, A., Yin, D., & Rozell, C. (2016) Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks. arXiv:1605.08346 [Cs, Math, Stat].
Cove65
Cover, T. M.(1965) Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition. IEEE Transactions on Electronic Computers, EC-14(3), 326–334. DOI.
GeLR16
Gel, Y. R., Lyubchich, V., & Ramirez, L. L.(2016) Fast Patchwork Bootstrap for Quantifying Estimation Uncertainties in Sparse Random Networks.
GiSB16
Giryes, R., Sapiro, G., & Bronstein, A. M.(2016) Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?. IEEE Transactions on Signal Processing, 64(13), 3444–3457. DOI.
GlLi16
Globerson, A., & Livni, R. (2016) Learning Infinite-Layer Networks: Beyond the Kernel Trick. arXiv:1606.05316 [Cs].
GBLT14
Goudarzi, A., Banda, P., Lakin, M. R., Teuscher, C., & Stefanovic, D. (2014) A Comparative Study of Reservoir Computing for Temporal Signal Processing. arXiv:1401.2224 [Cs].
GoTe16
Goudarzi, A., & Teuscher, C. (2016) Reservoir Computing: Quo Vadis?. In Proceedings of the 3rd ACM International Conference on Nanoscale Computing and Communication (p. 13:1–13:6). New York, NY, USA: ACM DOI.
GCWK09
Grzyb, B. J., Chinellato, E., Wojcik, G. M., & Kaminski, W. A.(2009) Which model to use for the Liquid State Machine?. In 2009 International Joint Conference on Neural Networks (pp. 1018–1024). DOI.
HaMa12
Hazan, H., & Manevitz, L. M.(2012) Topological constraints and robustness in liquid state machines. Expert Systems with Applications, 39(2), 1597–1606. DOI.
HeWH16
He, K., Wang, Y., & Hopcroft, J. (2016) A Powerful Generative Model Using Random Weights for the Deep Image Representation. arXiv:1606.04801 [Cs].
HuSi05
Huang, G.-B., & Siew, C.-K. (2005) Extreme learning machine with randomly assigned RBF kernels. International Journal of Information Technology, 11(1), 16–24.
HuZS04
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 985–990 vol.2). DOI.
HuZS06
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006) Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. DOI.
KrHo16
Krotov, D., & Hopfield, J. J.(2016) Dense Associative Memory for Pattern Recognition. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1172–1180). Curran Associates, Inc.
LiWa17
Li, M., & Wang, D. (2017) Insights into randomized algorithms for neural networks: Practical issues and common pitfalls. Information Sciences, 382–383, 170–178. DOI.
LuJa09
Lukoševičius, M., & Jaeger, H. (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127–149. DOI.
MaNM04
Maass, W., Natschläger, T., & Markram, H. (2004) Computational Models for Generic Cortical Microcircuits. In Computational Neuroscience: A Comprehensive Approach (pp. 575–605). Chapman & Hall/CRC
Mart16
Martinsson, P.-G. (2016) Randomized methods for matrix computations and analysis of high dimensional data. arXiv:1607.01649 [Math].
OyBZ17
Oyallon, E., Belilovsky, E., & Zagoruyko, S. (2017) Scaling the Scattering Transform: Deep Hybrid Networks. arXiv Preprint arXiv:1703.08961.
Pere16
Perez, C. E.(2016, November 6) Deep Learning: The Unreasonable Effectiveness of Randomness. Medium.
RaRe09
Rahimi, A., & Recht, B. (2009) Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems (pp. 1313–1320). Curran Associates, Inc.
ScWa17
Scardapane, S., & Wang, D. (2017) Randomness in neural networks: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(2). DOI.
Stei04
Steil, J. J.(2004) Backpropagation-decorrelation: online recurrent learning with O(N) complexity. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 843–848 vol.2). DOI.
TaHR06
Taylor, G. W., Hinton, G. E., & Roweis, S. T.(2006) Modeling human motion using binary latent variables. In Advances in neural information processing systems (pp. 1345–1352).
TBCC07
Tong, M. H., Bickett, A. D., Christiansen, E. M., & Cottrell, G. W.(2007) Learning grammatical structure with Echo State Networks. Neural Networks, 20(3), 424–432. DOI.
TJDM13
Triefenbach, F., Jalalvand, A., Demuynck, K., & Martens, J. P.(2013) Acoustic Modeling With Hierarchical Reservoirs. IEEE Transactions on Audio, Speech, and Language Processing, 21(11), 2439–2450. DOI.
ZhSu16
Zhang, L., & Suganthan, P. N.(2016) A survey of randomized algorithms for training neural networks. Information Sciences, 364–365(C), 146–155. DOI.