Probability axioms

The Kolmogorov axioms are the foundations of probability theory introduced by Andrey Kolmogorov in 1933.[1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.[2] An alternative approach to formalising probability, favoured by some Bayesians, is given by Cox's theorem.[3]

Axioms

The assumptions as to setting up the axioms can be summarised as follows: Let (Ω, F, P) be a measure space with P being the probability of some event E, denoted $P(E)$ , and $P(\Omega )$ = 1. Then (Ω, F, P) is a probability space, with sample space Ω, event space F and probability measure P.[1]

First axiom

The probability of an event is a non-negative real number:

P(E)\in \mathbb {R} ,P(E)\geq 0\qquad \forall E\in F

where $F$ is the event space. It follows that $P(E)$ is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1

P(\Omega )=1.

Third axiom

This is the assumption of σ-additivity:

Any countable sequence of disjoint sets (synonymous with mutually exclusive events)

E_{1},E_{2},\ldots

satisfies

P\left(\bigcup _{i=1}^{\infty }E_{i}\right)=\sum _{i=1}^{\infty }P(E_{i}).

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.[4] Quasiprobability distributions in general relax the third axiom.

Consequences

From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs[5][6][7] of these rules are a very insightful procedure that illustrates the power the third axiom, and its interaction with the remaining two axioms. Four of the immediate corollaries and their proofs are shown below:

Monotonicity

\quad {\text{if}}\quad A\subseteq B\quad {\text{then}}\quad P(A)\leq P(B).

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

Proof of monotonicity[5]

In order to verify the monotonicity property, we set $E_{1}=A$ and $E_{2}=B\setminus A$ , where $A\subseteq B$ and $E_{i}=\varnothing$ for $i\geq 3$ . It is easy to see that the sets $E_{i}$ are pairwise disjoint and $E_{1}\cup E_{2}\cup \cdots =B$ . Hence, we obtain from the third axiom that

P(A)+P(B\setminus A)+\sum _{i=3}^{\infty }P(E_{i})=P(B).

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to $P(B)$ which is finite, we obtain both $P(A)\leq P(B)$ and $P(\varnothing )=0$ .

The probability of the empty set

P(\varnothing )=0.

In some cases, $\varnothing$ is not the only event with probability 0.

Proof of probability of the empty set

As shown in the previous proof, $P(\varnothing )=0$ . However, this statement is seen by contradiction: if $P(\varnothing )=a$ then the left hand side $[P(A)+P(B\setminus A)+\sum _{i=3}^{\infty }P(E_{i})]$ is not less than infinity; $\sum _{i=3}^{\infty }P(E_{i})=\sum _{i=3}^{\infty }P(\varnothing )=\sum _{i=3}^{\infty }a={\begin{cases}0&{\text{if }}a=0,\\\infty &{\text{if }}a>0.\end{cases}}$

If $a>0$ then we obtain a contradiction, because the sum does not exceed $P(B)$ which is finite. Thus, $a=0$ . We have shown as a byproduct of the proof of monotonicity that $P(\varnothing )=0$ .

The complement rule

$P\left(A^{c}\right)=P(\Omega \setminus A)=1-P(A)$

Proof of the complement rule

Given $A$ and $A^{c}$ are mutually exclusive and that $A\cup A^{c}=\Omega$ :

$P(A\cup A^{c})=P(A)+P(A^{c})$ ... (by axiom 3)

and, $P(A\cup A^{c})=P(\Omega )=1$ ... (by axiom 2)

$\Rightarrow P(A)+P(A^{c})=1$

$\therefore P(A^{c})=1-P(A)$

The numeric bound

It immediately follows from the monotonicity property that

0\leq P(E)\leq 1\qquad \forall E\in F.

Proof of the numeric bound

Given the complement rule $P(E^{c})=1-P(E)$ and axiom 1 $P(E^{c})\geq 0$ :

$1-P(E)\geq 0$

$\Rightarrow 1\geq P(E)$

$\therefore 0\leq P(E)\leq 1$

Further consequences

Another important property is:

P(A\cup B)=P(A)+P(B)-P(A\cap B).

This is called the addition law of probability, or the sum rule. That is, the probability that A or B will happen is the sum of the probabilities that A will happen and that B will happen, minus the probability that both A and B will happen. The proof of this is as follows:

Firstly,

P(A\cup B)=P(A)+P(B\setminus A)

... (by Axiom 3)

So,

P(A\cup B)=P(A)+P(B\setminus (A\cap B))

(by

B\setminus A=B\setminus (A\cap B)

).

Also,

P(B)=P(B\setminus (A\cap B))+P(A\cap B)

and eliminating $P(B\setminus (A\cap B))$ from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion–exclusion principle.

Setting B to the complement A^c of A in the addition law gives

P\left(A^{c}\right)=P(\Omega \setminus A)=1-P(A)

That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.

We may define:

\Omega =\{H,T\}

F=\{\varnothing ,\{H\},\{T\},\{H,T\}\}

Kolmogorov's axioms imply that:

P(\varnothing )=0

The probability of neither heads nor tails, is 0.

P(\{H,T\}^{c})=0

The probability of either heads or tails, is 1.

P(\{H\})+P(\{T\})=1

The sum of the probability of heads and the probability of tails, is 1.

gollark: Also, should my shimmerscale not gender now? It's at 4d23h with 6084V/954UV.

gollark: See earlier messages for the rest.

gollark: Extra feature: 1/5 chance per breeding of turning the mate into a sinnerscale too!

gollark: Or sinnerscales.

gollark: In the end, all shall become Neglected.

References

Kolmogorov, Andrey (1950) [1933]. Foundations of the theory of probability. New York, USA: Chelsea Publishing Company.
Aldous, David. "What is the significance of the Kolmogorov axioms?". David Aldous. Retrieved November 19, 2019.
Terenin Alexander; David Draper (2015). "Cox's Theorem and the Jaynesian Interpretation of Probability". arXiv:1507.06597. Bibcode:2015arXiv150706597T. Cite journal requires |journal= (help)
Hájek, Alan (August 28, 2019). "Interpretations of Probability". Stanford Encyclopedia of Philosophy. Retrieved November 17, 2019.
Ross, Sheldon M. (2014). A first course in probability (Ninth ed.). Upper Saddle River, New Jersey. pp. 27, 28. ISBN 978-0-321-79477-2. OCLC 827003384.
Gerard, David (December 9, 2017). "Proofs from axioms" (PDF). Retrieved November 20, 2019.
Jackson, Bill (2010). "Probability (Lecture Notes - Week 3)" (PDF). School of Mathematics, Queen Mary University of London. Retrieved November 20, 2019.