Bayes classifier

In statistical classification, the Bayes classifier minimizes the probability of misclassification.[1]

Definition

Suppose a pair takes values in , where is the class label of . This means that the conditional distribution of X, given that the label Y takes the value r is given by

for

where "" means "is distributed as", and where denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

The Bayes classifier is

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, . The Bayes classifier is a useful benchmark in statistical classification.

The excess risk of a general classifier (possibly depending on some training data) is defined as Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.[2]

Proof of Optimality

Proof that the Bayes classifier is optimal and Bayes error rate is minimal proceeds as follows.

Define the variables: Risk , Bayes risk , all possible classes to which the points can be classified . Let the posterior probability of a point belonging to class 1 be . Define the classifier as

Then we have the following results:

(a) , i.e. is a Bayes classifier,

(b) For any classifier , the excess risk satisfies

(c)


Proof of (a): For any classifier , we have

Notice that is minimised by taking ,

Therefore the minimum possible risk is the Bayes risk, .


Proof of (b):


Proof of (c):

gollark: I just procedurally generate ideologies as needed.
gollark: *Surely* that's ironical.
gollark: C++ is *not* safe.
gollark: Yes.
gollark: CSS is very powerful and can often do very cool things, but then some other comparatively simple things are really hard for no apparent reason.

See also

References

  1. Devroye, L.; Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7.
  2. https://dl.acm.org/doi/abs/10.1109/18.243433
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.