Studentized range

In statistics, the studentized range is the difference between the largest and smallest data in a sample measured in units of sample standard deviations.

The studentized range, q, is named for William Sealy Gosset (who wrote under the pseudonym "Student"), and was introduced by him (1927).[1] The concept was later presented by a number of actual students, Newman (1939)[2] and Keuls (1952)[3] and John Tukey in some unpublished notes. q is the basic statistic for the studentized range distribution, which is used for multiple comparison procedures, such as the single step procedure Tukey's range test, the Newman–Keuls method, and the Duncan's step down procedure, and establishing confidence intervals that are still valid after data snooping has occurred.[4]

Description

The value of the studentized range is most often represented by the variable q.

The studentized range can be defined based on a random sample x1, ..., xn from the N(0, 1) distribution of numbers, and another random variable s that is independent of all the xi, and νs2 has a χ2 distribution with ν degrees of freedom. Then

has the Studentized range distribution for n groups and ν degrees of freedom. In applications, the xi are typically the means of samples each of size m, s2 is the pooled variance, and the degrees of freedom are ν = n(m  1).

The critical value of q is based on three factors:

  1. α (the probability of rejecting a true null hypothesis)
  2. n (the number of observations or groups)
  3. ν (the degrees of freedom used to estimate the sample variance)

Distribution (normal data) and applications

If X1, ..., Xn are independent identically distributed random variables that are normally distributed, the probability distribution of their studentized range is what is usually called the studentized range distribution. Note that the definition of q does not depend on the expected value or the standard deviation of the distribution from which the sample is drawn, and therefore its probability distribution is the same regardless of those parameters. tables of the distribution quantiles are available here.

The Studentized range distribution has applications to hypothesis testing and multiple comparisons procedures. For example, Tukey's range test and Duncan's new multiple range test (MRT), in which the sample x1, ..., xn is a sample of means and q is the basic test-statistic, can be used as post-hoc analysis to test between which two groups means there is a significant difference (pairwise comparisons) after rejecting the null hypothesis that all groups are from the same population (i.e. all means are equal) by the standard analysis of variance.[5]

When only the equality of the two groups means is in question (i.e. whether μ1 = μ2), the studentized range distribution is similar to the Student's t distribution, differing only in that the first takes into account the number of means under consideration, and the critical value is adjusted accordingly. The more means under consideration, the larger the critical value is. This makes sense since the more means there are, the greater the probability that at least some differences between pairs of means will be significantly large due to chance alone.

Studentized data

Generally, the term studentized means that the variable's scale was adjusted by dividing by an estimate of a population standard deviation (see also studentized residual). The fact that the standard deviation is a sample standard deviation rather than the population standard deviation, and thus something that differs from one random sample to the next, is essential to the definition and the distribution of the Studentized data. The variability in the value of the sample standard deviation contributes additional uncertainty into the values calculated. This complicates the problem of finding the probability distribution of any statistic that is studentized.

gollark: I wonder if I can make it downsample older stuff.
gollark: I'm pretty sure this will eventually accumulate several gigabytes of unnecessary time series data but oh well.
gollark: Which feed my monitoring system.
gollark: It provides per-subdomain counters for requests and stuff.
gollark: Of course.

See also

Notes

  1. Student (1927). "Errors of routine analysis". Biometrika. 19 (1/2): 151–164. doi:10.2307/2332181. JSTOR 2332181.
  2. Newman D. (1939). "The Distribution of Range in Samples from a Normal Population Expressed in Terms of an Independent Estimate of Standard Deviation". Biometrika. 31 (1–2): 20–30. doi:10.1093/biomet/31.1-2.20.
  3. Keuls M. (1952). "The Use of the "Studentized Range" in Connection with an Analysis of Variance". Euphytica. 1 (2): 112–122. doi:10.1007/bf01908269.
  4. John A. Rafter (2002). "Multiple Comparison Methods for Means". SIAM Review. 44 (2): 259–278. Bibcode:2002SIAMR..44..259R. CiteSeerX 10.1.1.132.2976. doi:10.1137/s0036144501357233.
  5. Pearson & Hartley (1970, Section 14.2)

References

  • Pearson, E.S.; Hartley, H.O. (1970) Biometrika Tables for Statisticians, Volume 1, 3rd Edition, Cambridge University Press. ISBN 0-521-05920-8

Further reading

  • John Neter, Michael H. Kutner, Christopher J. Nachtsheim, William Wasserman (1996) Applied Linear Statistical Models, fourth edition, McGraw-Hill, page 726.
  • John A. Rice (1995) Mathematical Statistics and Data Analysis, second edition, Duxbury Press, pages 451452.
  • Douglas C. Montgomery (2013) "Design and Analysis of Experiments", eighth edition, Wiley, page 98.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.