Generalized Hebbian algorithm

The generalized Hebbian algorithm (GHA), also known in the literature as Sanger's rule, is a linear feedforward neural network model for unsupervised learning with applications primarily in principal components analysis. First defined in 1989,[1] it is similar to Oja's rule in its formulation and stability, except it can be applied to networks with multiple outputs. The name originates because of the similarity between the algorithm and a hypothesis made by Donald Hebb[2] about the way in which synaptic strengths in the brain are modified in response to experience, i.e., that changes are proportional to the correlation between the firing of pre- and post-synaptic neurons.[3]

Theory

The GHA combines Oja's rule with the Gram-Schmidt process to produce a learning rule of the form

\,\Delta w_{ij}~=~\eta \left(y_{i}x_{j}-y_{i}\sum _{k=1}^{i}w_{kj}y_{k}\right)

,[4]

where $w ij$ defines the synaptic weight or connection strength between the $j$ th input and $i$ th output neurons, $x$ and $y$ are the input and output vectors, respectively, and $η$ is the learning rate parameter.

Derivation

In matrix form, Oja's rule can be written

\,{\frac {dw(t)}{dt}}~=~w(t)Q-\mathrm {diag} [w(t)Qw(t)^{\mathrm {T} }]w(t)

,

and the Gram-Schmidt algorithm is

\,\Delta w(t)~=~-\mathrm {lower} [w(t)w(t)^{\mathrm {T} }]w(t)

,

where $w (t)$ is any matrix, in this case representing synaptic weights, $Q = η x x T$ is the autocorrelation matrix, simply the outer product of inputs, $diag$ is the function that diagonalizes a matrix, and $lower$ is the function that sets all matrix elements on or above the diagonal equal to 0. We can combine these equations to get our original rule in matrix form,

\,\Delta w(t)~=~\eta (t)\left(\mathbf {y} (t)\mathbf {x} (t)^{\mathrm {T} }-\mathrm {LT} [\mathbf {y} (t)\mathbf {y} (t)^{\mathrm {T} }]w(t)\right)

,

where the function $LT$ sets all matrix elements above the diagonal equal to 0, and note that our output $y (t) = w (t) x (t)$ is a linear neuron.[1]

Stability and PCA

[5] [6]

Applications

The GHA is used in applications where a self-organizing map is necessary, or where a feature or principal components analysis can be used. Examples of such cases include artificial intelligence and speech and image processing.

Its importance comes from the fact that learning is a single-layer process—that is, a synaptic weight changes only depending on the response of the inputs and outputs of that layer, thus avoiding the multi-layer dependence associated with the backpropagation algorithm. It also has a simple and predictable trade-off between learning speed and accuracy of convergence as set by the learning rate parameter $η$ .[5]

gollark: Or GPUs.

gollark: You basically need ASICs for that to be profitable.

gollark: Because I hate having spare computational resources, this is being rewritten in Python.

gollark: Minoteaur is something like 30KB as it stands.

gollark: NOTICE: Minio is considered "bee".

References

Sanger, Terence D. (1989). "Optimal unsupervised learning in a single-layer linear feedforward neural network" (PDF). Neural Networks. 2 (6): 459–473. CiteSeerX 10.1.1.128.6893. doi:10.1016/0893-6080(89)90044-0. Retrieved 2007-11-24.
Hebb, D.O. (1949). The Organization of Behavior. New York: Wiley & Sons. ISBN 9781135631918.CS1 maint: ref=harv (link)
Hertz, John; Anders Krough; Richard G. Palmer (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley Publishing Company. ISBN 978-0201515602.
Gorrell, Genevieve (2006), "Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing.", EACL, CiteSeerX 10.1.1.102.2084
Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation (2 ed.). Prentice Hall. ISBN 978-0-13-273350-2.
Oja, Erkki (November 1982). "Simplified neuron model as a principal component analyzer". Journal of Mathematical Biology. 15 (3): 267–273. doi:10.1007/BF00275687. PMID 7153672. BF00275687.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Sanger89-1] Sanger, Terence D. (1989). "Optimal unsupervised learning in a single-layer linear feedforward neural network" (PDF). Neural Networks. 2 (6): 459–473. CiteSeerX 10.1.1.128.6893. doi:10.1016/0893-6080(89)90044-0. Retrieved 2007-11-24.

[Hebb_1949-2] Hebb, D.O. (1949). The Organization of Behavior. New York: Wiley & Sons. ISBN 9781135631918.CS1 maint: ref=harv (link)

[Hertz,_Krough,_and_Palmer,_1991-3] Hertz, John; Anders Krough; Richard G. Palmer (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley Publishing Company. ISBN 978-0201515602.

[4] Gorrell, Genevieve (2006), "Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing.", EACL, CiteSeerX 10.1.1.102.2084

[Haykin98-5] Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation (2 ed.). Prentice Hall. ISBN 978-0-13-273350-2.

[Oja82-6] Oja, Erkki (November 1982). "Simplified neuron model as a principal component analyzer". Journal of Mathematical Biology. 15 (3): 267–273. doi:10.1007/BF00275687. PMID 7153672. BF00275687.