Highway network

In machine learning, a highway network is an approach to optimizing networks and increasing their depth. Highway networks use learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks. The gating mechanisms allow neural networks to have paths for information to follow across different layers ("information highways").[1][2]

Highway networks have been used as part of text sequence labeling and speech recognition tasks.[3][4]


Model

The model has two gates in addition to the H(WH, x) gate: the transform gate T(WT, x) and the carry gate C(WC, x). Those two last gates are non-linear transfer functions (by convention Sigmoid function). The H(WH, x) function can be any desired transfer function.

The carry gate is defined as C(WC, x) = 1 - T(WT, x). While the transform gate is just a gate with a sigmoid transfer function.


Structure

The structure of a hidden layer follows the equation:


The advantage of a Highway Network over the common deep neural networks is that solves or prevents partially the Vanishing gradient problem, thus leading to easier to optimize neural networks.


gollark: no.
gollark: This is NOT type theoretic.
gollark: That is ALSO not type theory.
gollark: Yes.
gollark: CEASE your untyped behavior.

References

  1. Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
  2. Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems 28. Curran Associates, Inc.: 2377–2385.
  3. Liu, Liyuan; Shang, Jingbo; Xu, Frank F.; Ren, Xiang; Gui, Huan; Peng, Jian; Han, Jiawei (12 September 2017). "Empower Sequence Labeling with Task-Aware Neural Language Model". arXiv:1709.04109 [cs.CL].
  4. Kurata, Gakuto; Ramabhadran, Bhuvana; Saon, George; Sethy, Abhinav (19 September 2017). "Language Modeling with Highway LSTM". arXiv:1709.06436 [cs.CL].


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.