Horvitz–Thompson estimator

In statistics, the Horvitz–Thompson estimator, named after Daniel G. Horvitz and Donovan J. Thompson,[1] is a method for estimating the total[2] and mean of a pseudo-population in a stratified sample. Inverse probability weighting is applied to account for different proportions of observations within strata in a target population. The Horvitz–Thompson estimator is frequently applied in survey analyses and can be used to account for missing data.

The method

Formally, let be an independent sample from n of N ≥ n distinct strata with a common mean μ. Suppose further that is the inclusion probability that a randomly sampled individual in a superpopulation belongs to the ith stratum. The Horvitz–Thompson estimate of the total is given by:

and the estimate of the mean is given by:

In a Bayesian probabilistic framework is considered the proportion of individuals in a target population belonging to the ith stratum. Hence, could be thought of as an estimate of the complete sample of persons within the ith stratum. The Horvitz–Thompson estimator can also be expressed as the limit of a weighted bootstrap resampling estimate of the mean. It can also be viewed as a special case of multiple imputation approaches.[3]

For post-stratified study designs, estimation of and are done in distinct steps. In such cases, computating the variance of is not straightforward. Resampling techniques such as the bootstrap or the jackknife can be applied to gain consistent estimates of the variance of the Horvitz–Thompson estimator.[4] The "survey" package for R conducts analyses for post-stratified data using the Horvitz–Thompson estimator.[5]

Proof of Horvitz-Thompson Unbiased Estimation of the Mean

The Horvitz–Thompson estimator can be shown to be unbiased when evaluating the expectation of the Horvitz–Thompson estimator, , as follows:

gollark: The nuclear family is a rather recent innovation.
gollark: I'm accepting of polyamory (polygamy I think, if you dislike mixing Latin and Greek) but only if people are honest about it.
gollark: This is true, but in general.
gollark: Yes, I feel like the bad part is the lying mostly.
gollark: Although I *probably* haven't isolated my Discord-y profiles from my real-life information as well as I should, oops!

References

  1. Horvitz, D. G.; Thompson, D. J. (1952) "A generalization of sampling without replacement from a finite universe", Journal of the American Statistical Association, 47, 663–685, . JSTOR 2280784
  2. William G. Cochran (1977), Sampling Techniques, 3rd Edition, Wiley. ISBN 0-471-16240-X
  3. Roderick J.A. Little, Donald B. Rubin (2002) Statistical Analysis With Missing Data, 2nd ed., Wiley. ISBN 0-471-18386-5
  4. Quatember, A. (2014). "The Finite Population Bootstrap - from the Maximum Likelihood to the Horvitz-Thompson Approach". Austrian Journal of Statistics. 43: 93–102.
  5. https://cran.r-project.org/web/packages/survey/
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.