Radial basis function kernel

In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification.[1]

The RBF kernel on two samples x and x', represented as feature vectors in some input space, is defined as[2]

K(\mathbf {x} ,\mathbf {x'} )=\exp \left(-{\frac {\|\mathbf {x} -\mathbf {x'} \|^{2}}{2\sigma ^{2}}}\right)

$\textstyle \|\mathbf {x} -\mathbf {x'} \|^{2}$ may be recognized as the squared Euclidean distance between the two feature vectors. $\sigma$ is a free parameter. An equivalent definition involves a parameter $\textstyle \gamma ={\tfrac {1}{2\sigma ^{2}}}$ :

K(\mathbf {x} ,\mathbf {x'} )=\exp(-\gamma \|\mathbf {x} -\mathbf {x'} \|^{2})

Since the value of the RBF kernel decreases with distance and ranges between zero (in the limit) and one (when $x = x'$ ), it has a ready interpretation as a similarity measure.[2] The feature space of the kernel has an infinite number of dimensions; for $\sigma =1$ , its expansion is:[3]

{\begin{alignedat}{2}\exp \left(-{\frac {1}{2}}\|\mathbf {x} -\mathbf {x'} \|^{2}\right)&=\sum _{j=0}^{\infty }{\frac {(\mathbf {x} ^{\top }\mathbf {x'} )^{j}}{j!}}\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right)\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right)\\&=\sum _{j=0}^{\infty }\sum _{\sum n_{i}=j}\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right){\frac {x_{1}^{n_{1}}\cdots x_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right){\frac {{x'}_{1}^{n_{1}}\cdots {x'}_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\end{alignedat}}

Approximations

Because support vector machines and other models employing the kernel trick do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been introduced.[4] Typically, these take the form of a function z that maps a single vector to a vector of higher dimensionality, approximating the kernel:

\langle z(\mathbf {x} ),z(\mathbf {x'} )\rangle \approx \langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle =K(\mathbf {x} ,\mathbf {x'} )

where $\textstyle \varphi$ is the implicit mapping embedded in the RBF kernel.

One way to construct such a z is to randomly sample from the Fourier transformation of the kernel.[5] Another approach uses the Nyström method to approximate the eigendecomposition of the Gram matrix K, using only a random sample of the training set.[6]

gollark: And my last ones.

gollark: Even *I* know that the right way to do it would be to write the data to a file, use a hexdump program on the file, pipe that to the `printf` command or something to convert that to decimal, and then redirect the output of *that* to a file which you then memory-map and call `write` on.

gollark: I am glad we are in agreement.

gollark: Anyway, procedurally generated monopoly probably *could* have incentives to buy properties if there are mechanisms to go back, or jump ahead of people.

gollark: That's your fault.

References

Chang, Yin-Wen; Hsieh, Cho-Jui; Chang, Kai-Wei; Ringgaard, Michael; Lin, Chih-Jen (2010). "Training and testing low-degree polynomial data mappings via linear SVM". Journal of Machine Learning Research. 11: 1471–1490.
Jean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf (2004). "A primer on kernel methods". Kernel Methods in Computational Biology.
Shashua, Amnon (2009). "Introduction to Machine Learning: Class Notes 67577". arXiv:0904.3664v1 [cs.LG].
Andreas Müller (2012). Kernel Approximations for Efficient SVMs (and other feature extraction methods).
Ali Rahimi and Benjamin Recht (2007). "Random features for large-scale kernel machines". Neural Information Processing Systems.
C.K.I. Williams and M. Seeger (2001). "Using the Nyström method to speed up kernel machines". Advances in Neural Information Processing Systems.CS1 maint: uses authors parameter (link)

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Chang2010-1] Chang, Yin-Wen; Hsieh, Cho-Jui; Chang, Kai-Wei; Ringgaard, Michael; Lin, Chih-Jen (2010). "Training and testing low-degree polynomial data mappings via linear SVM". Journal of Machine Learning Research. 11: 1471–1490.

[primer-2] Jean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf (2004). "A primer on kernel methods". Kernel Methods in Computational Biology.

[3] Shashua, Amnon (2009). "Introduction to Machine Learning: Class Notes 67577". arXiv:0904.3664v1 [cs.LG].

[4] Andreas Müller (2012). Kernel Approximations for Efficient SVMs (and other feature extraction methods).

[5] Ali Rahimi and Benjamin Recht (2007). "Random features for large-scale kernel machines". Neural Information Processing Systems.

[6] C.K.I. Williams and M. Seeger (2001). "Using the Nyström method to speed up kernel machines". Advances in Neural Information Processing Systems.CS1 maint: uses authors parameter (link)

Radial basis function kernel

Approximations

See also

References