Negative hypergeometric distribution

In probability theory and statistics, the negative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories like Pass/Fail, Male/Female or Employed/Unemployed. As random selections are made from the population, each subsequent draw decreases the population causing the probability of success to change with each draw. Unlike the standard hypergeometric distribution, which describes the number of successes in a fixed sample size, in the negative hypergeometric distribution, samples are drawn until failures have been found, and the distribution describes the probability of finding successes in such a sample. In other words, the negative hypergeometric distribution describes the likelihood of successes in a sample with exactly failures.

Negative hypergeometric
Probability mass function
Cumulative distribution function
Parameters

- total number of elements
- total number of 'success' elements

- number of failures when experiment is stopped
Support - number of successes when experiment is stopped.
pmf
Mean
Variance

Definition

There are elements, of which are defined as "successes" and the rest are "failures".

Elements are drawn one after the other, without replacements, until failures are encountered. Then, the drawing stops and the number of successes is counted. The negative hypergeometric distribution, is the discrete distribution of this .

[1]

The outcome requires that we observe successes in draws and the bit must be a failure. The probability of the former can be found by the direct application of the hypergeometric distribution and the probability of the latter is simply the number of failures remaining divided by the size of the remaining population . The probability of having exactly successes up to the failure (i.e. the drawing stops as soon as the sample includes the predefined number of failures) is then the product of these two probabilities:

Therefore, a random variable follows the negative hypergeometric distribution if its probability mass function (pmf) is given by

where

  • is the population size,
  • is the number of success states in the population,
  • is the number of failures,
  • is the number of observed successes,
  • is a binomial coefficient

By design the probabilities sum up to 1. However, in case we want show it explicitly we have:

where we have used that,

which can be derived using the binomial identity, , and the Chu–Vandermonde identity, , which holds for any complex-values and and any non-negative integer .

The relationship can also be found by examination of the coefficient of in the expansion of , using Newton's binomial series.

Expectation

When counting the number of successes before failures, the expected number of successes is and can be derived as follows.

where we have used the relationship , that we derived above to show that the negative hypergeometric distribution was properly normalized.

Variance

The variance can be derived by the following calculation.

Then the variance is

If the drawing stops after a constant number of draws (regardless of the number of failures), then the number of successes has the hypergeometric distribution, . The two functions are related in the following way:[1]

Negative-hypergeometric distribution (like the hypergeometric distribution) deals with draws without replacement, so that the probability of success is different in each draw. In contrast, negative-binomial distribution (like the binomial distribution) deals with draws with replacement, so that the probability of success is the same and the trials are independent. The following table summarizes the four distributions related to drawing items:

With replacementsNo replacements
# of successes in constant # of drawsbinomial distributionhypergeometric distribution
# of successes in constant # of failuresnegative binomial distributionnegative hypergeometric distribution


gollark: To be fair the packages are generally less stupid than npm and the standard library is enough for many programs.
gollark: I have to admit that having my laptop compile 199 dependencies for this project is less than ideal.
gollark: What do you mean "captcha feature"? What does that have to do with sleep mode.
gollark: Hmm, that's fair.
gollark: You won't even use Rust *programs*?

References

  1. Negative hypergeometric distribution in Encyclopedia of Math.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.