Echo state network

The echo state network (ESN),[1][2] is a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

Alternatively, one may consider a nonparametric Bayesian formulation of the output layer, under which: (i) a prior distribution is imposed over the output weights; and (ii) the output weights are marginalized out in the context of prediction generation, given the training data. This idea has been demonstrated in [3] by using Gaussian priors, whereby a Gaussian process model with ESN-driven kernel function is obtained. Such a solution was shown to outperform ESNs with trainable (finite) sets of weights in several benchmarks.

Some publicly available implementations of ESNs are: (i) aureservoir: an efficient C++ library for various kinds of echo state networks with python/numpy bindings; and (ii) Matlab code: an efficient matlab for an echo state network.

The Echo State Network (ESN)[4] belongs to the Recurrent Neural Network (RNN) family and provide their architecture and supervised learning principle. Unlike Feedforward Neural Networks, Recurrent Neural Networks are dynamic systems and not functions. Recurrent Neural Networks are typically used for: Learn dynamical process: signal treatment in engineering and telecommunications, vibration analysis, seismology, control of engines and generators. Signal forecasting and generation: text, music, electric signals. Modeling of biological systems, neurosciences (cognitive neurodynamics), memory modeling, brain-computer Interfaces (BCIs), filtering and Kalman processes, military applications, volatility modeling etc.

For the training of RNN a number of learning algorithms are available: backpropagation through time, real-time recurrent learning. Convergence is not guaranteed due to instability and bifurcation phenomena[4].  

The main approach of the ESN is firstly to operate a random, large, fixed, recurring neural network with the input signal, which induces a nonlinear response signal in each neuron within this "reservoir" network, and secondly connect a desired output signal by a trainable linear combination of all these response signals[2].

Another feature of the ESN is the autonomous operation in prediction: if the Echo State Network is trained with an input that is a backshifted version of the output, then it can be used for signal generation/prediction by using the previous output as input[4].

The main idea of ESNs is tied to Liquid State Machines (LSM), which were independently and simultaneously developed with ESNs by Wolfgang Maass[5]. LSMs, ESNs and the newly researched Backpropagation Decorrelation learning rule for RNNs[6] are more and more summarized under the name Reservoir Computing.

Schiller and Steil[6] also demonstrated that in conventional training approaches for RNNs, in which all weights (not only output weights) are adapted, the dominant changes are in output weights. In cognitive neuroscience, Peter F. Dominey analysed a related process related to the modelling of sequence processing in the mammalian brain, in particular speech recognition in the human brain[7]. The basic idea also included a model of temporal input discrimination in biological neuronal networks[8]. An early clear formulation of the reservoir computing idea is due to K. Kirby, who disclosed this concept in a largely forgotten conference contribution[9]. The first formulation of the reservoir computing idea known today stems from L. Schomaker[10] , who described how a desired target output can be obtained from an RNN by learning to combine signals from a randomly configured ensemble of spiking neural oscillators.[2]

Variants

Echo state networks can be built in different ways. They can be set up with or without directly trainable input-to-output connections, with or without output reservation feedback, with different neurotypes, different reservoir internal connectivity patterns etc. The output weight can be calculated for linear regression with all algorithms whether they are online or offline. In addition to the solutions for errors with smallest squares, margin maximization criteria, so-called training support vector machines, are used to determine the output values.[11]  

The fixed RNN acts as a random, nonlinear medium whose dynamic response, the "echo", is used as a signal base. The linear combination of this base can be trained to reconstruct the desired output by minimizing some error criteria.[2]

Significance

RNNs were rarely used in practice before the introduction of the ESN. Because these models fit need a version of the gradient descent to adjust the connections. As a result, the algorithms are slow and much worse, making the learning process vulnerable to branching errors[12].  Convergence cannot therefore be guaranteed. The problem with branching does not have the ESN training and is additionally easy to implement. ESNs outperform all other nonlinear dynamic models.[1] [13] However, today the problem that RNNs made slow and error-prone has been solved with the advent of Deep Learning and the unique selling point of ESNs has been lost. In addition, the RNNs have proven themselves in several practical areas such as language processing. To cope with tasks of similar complexity using reservoir calculation methods, it would require memory of excessive size. However, they are used in some areas such as many signal processing applications. However, ESNs have been widely used as a computing principle that mixes with non-digital computer substrates. For example: optical microchips, mechanical nanooscillators, polymer mixtures or even artificial soft limbs.[2]

gollark: Weird. I would have said it was a marker for the heads of something, but I doubt it would have to be dots for that.
gollark: People sometimes say that they can't learn properly without experiencing the real world or whatever, but text is very information-dense and there is a *lot* of it.
gollark: So far.
gollark: There's no *known* reason you couldn't get them all the way to human performance. It might not be possible or it might be hilariously inefficient, but as far as I know the lines on the graphs remain straight.
gollark: Apparently this tends to improve with scale. I'm not sure if the details of Delphi are available anywhere.

See also

  • Liquid-state machine: a similar concept with generalized signal and network.
  • Reservoir computing

References

  1. Herbert Jaeger and Harald Haas. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 2 April 2004: Vol. 304. no. 5667, pp. 78 – 80 doi:10.1126/science.1091277 PDF
  2. Herbert Jaeger (2007) Echo State Network. Scholarpedia.
  3. Sotirios P. Chatzis, Yiannis Demiris, “Echo State Gaussian Process,” IEEE Transactions on Neural Networks, vol. 22, no. 9, pp. 1435-1445, Sep. 2011.
  4. Jaeger, Herbert (2002). A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. Germany: German National Research Center for Information Technology. pp. 1–45.
  5. Maass W., Natschlaeger T., and Markram H. (2002). "Real-time computing without stable states: A new framework for neural computation based on perturbations". Neural Computation. 14 (11): 2531–2560. doi:10.1162/089976602760407955. PMID 12433288.CS1 maint: multiple names: authors list (link)
  6. Schiller U.D. and Steil J. J. (2005). "Analyzing the weight dynamics of recurrent learning algorithms". Neurocomputing. 63: 5–23. doi:10.1016/j.neucom.2004.04.006.
  7. Dominey P.F. (1995). "Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning". Biol. Cybernetics. 73 (3): 265–274. doi:10.1007/BF00201428.
  8. Buonomano, D.V. and Merzenich, M.M. (1995). "Temporal Information Transformed into a Spatial Code by a Neural Network with Realistic Properties". Science. 267 (5200): 1028–1030. Bibcode:1995Sci...267.1028B. doi:10.1126/science.7863330. PMID 7863330. S2CID 12880807.CS1 maint: multiple names: authors list (link)
  9. Kirby, K. (1991). "Context dynamics in neural sequential learning. Proc". Florida AI Research Symposium: 66–70.
  10. Schomaker, L. (1992). "A neural oscillator-network model of temporal pattern generation". Human Movement Science. 11 (1–2): 181–192. doi:10.1016/0167-9457(92)90059-K.
  11. Schmidhuber J., Gomez F., Wierstra D., and Gagliolo M. (2007). "Training recurrent networks by evolino". Neural Computation. 19 (3): 757–779. doi:10.1162/neco.2007.19.3.757. PMID 17298232.CS1 maint: multiple names: authors list (link)
  12. Doya K. (1992). "Bifurcations in the learning of recurrent neural networks". In Proceedings of 1992 IEEE Int. Symp. On Circuits and Systems. 6: 2777–2780. doi:10.1109/ISCAS.1992.230622. ISBN 0-7803-0593-0.
  13. Jaeger H. (2007). "Discovering multiscale dynamical features with hierarchical echo state networks". Technical Report 10, School of Engineering and Science, Jacobs University.


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.