Welch's t-test

In statistics, Welch's t-test, or unequal variances t-test, is a two-sample location test which is used to test the hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student's t-test,[1] and is more reliable when the two samples have unequal variances and/or unequal sample sizes.[2][3] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test[2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" — or "unequal variances t-test" for brevity.[3]

Assumptions

Student's t-test assumes that the two sample distributions are normally distributed with equal variance. Welch's t-test is designed for unequal sample distribution variance, but the assumption of sample distribution normality is maintained.[1] Welch's t-test is an approximate solution to the Behrens–Fisher problem.

Calculations

Welch's t-test defines the statistic t by the following formula:

t\quad =\quad {\;{\overline {X}}_{1}-{\overline {X}}_{2}\; \over {\sqrt {\;{s_{1}^{2} \over N_{1}}\;+\;{s_{2}^{2} \over N_{2}}\quad }}}\,

where ${\overline {X}}_{j}$ , $s_{j}$ and $N_{j}$ are the $j^{th}$ sample mean, sample standard deviation and sample size, respectively, $j\in \{1,2\}$ . Unlike in Student's t-test, the denominator is not based on a pooled variance estimate.

The degrees of freedom $\nu$ associated with this variance estimate is approximated using the Welch–Satterthwaite equation:

\nu \quad \approx \quad {{\left(\;{s_{1}^{2} \over N_{1}}\;+\;{s_{2}^{2} \over N_{2}}\;\right)^{2}} \over {\quad {s_{1}^{4} \over N_{1}^{2}\nu _{1}}\;+\;{s_{2}^{4} \over N_{2}^{2}\nu _{2}}\quad }}

Here $\nu _{1}=N_{1}-1$ , the degrees of freedom associated with the first variance estimate. $\nu _{2}=N_{2}-1$ , the degrees of freedom associated with the 2nd variance estimate.

The statistic approximately from the t-distribution since we have an approximation of the chi-square distribution. This approximation is better done when both $N_{1}$ and $N_{2}$ are larger than 5.[4][5]

Statistical test

Once t and $\nu$ have been computed, these statistics can be used with the t-distribution to test one of two possible null hypotheses:

that the two population means are equal, in which a two-tailed test is applied; or
that one of the population means is greater than or equal to the other, in which a one-tailed test is applied.

The approximate degrees of freedom are rounded down to the nearest integer.

Advantages and limitations

Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes under normality. Furthermore, the power of Welch's t-test comes close to that of Student's t-test, even when the population variances are equal and sample sizes are balanced.[2] Welch's t-test can be generalized to more than 2-samples,[6] which is more robust than one-way analysis of variance (ANOVA).

It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test.[7] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes.[8] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch's t-test.[9]

Examples

The following three examples compare Welch's t-test and Student's t-test. Samples are from random normal distributions using the R programming language.

For all three examples, the population means were $\mu _{1}=20$ and $\mu _{2}=22$ .

The first example is for equal variances ( $\sigma _{1}^{2}=\sigma _{2}^{2}=4$ ) and equal sample sizes ( $N_{1}=N_{2}=15$ ). Let A1 and A2 denote two random samples:

A_{1}=\{27.5,21.0,19.0,23.6,17.0,17.9,16.9,20.1,21.9,22.6,23.1,19.6,19.0,21.7,21.4\}

A_{2}=\{27.1,22.0,20.8,23.4,23.4,23.5,25.8,22.0,24.8,20.2,21.9,22.1,22.9,20.5,24.4\}

The second example is for unequal variances ( $\sigma _{1}^{2}=16$ , $\sigma _{2}^{2}=1$ ) and unequal sample sizes ( $N_{1}=10$ , $N_{2}=20$ ). The smaller sample has the larger variance:

{\begin{aligned}A_{1}&=\{17.2,20.9,22.6,18.1,21.7,21.4,23.5,24.2,14.7,21.8\}\\A_{2}&=\{21.5,22.8,21.0,23.0,21.6,23.6,22.5,20.7,23.4,21.8,20.7,21.7,21.5,22.5,23.6,21.5,22.5,23.5,21.5,21.8\}\end{aligned}}

The third example is for unequal variances ( $\sigma _{1}^{2}=1$ , $\sigma _{2}^{2}=16$ ) and unequal sample sizes ( $N_{1}=10$ , $N_{2}=20$ ). The larger sample has the larger variance:

{\begin{aligned}A_{1}&=\{19.8,20.4,19.6,17.8,18.5,18.9,18.3,18.9,19.5,22.0\}\\A_{2}&=\{28.2,26.6,20.1,23.3,25.2,22.1,17.7,27.6,20.6,13.7,23.2,17.5,20.6,18.0,23.9,21.6,24.3,20.4,24.0,13.2\}\end{aligned}}

Reference p-values were obtained by simulating the distributions of the t statistics for the null hypothesis of equal population means ( $\mu _{1}-\mu _{2}=0$ ). Results are summarised in the table below, with two-tailed p-values:

	Sample A1			Sample A2			Student's t-test				Welch's t-test
Example	$N_{1}$	${\overline {X}}_{1}$	$s_{1}^{2}$	$N_{2}$	${\overline {X}}_{2}$	$s_{2}^{2}$	$t$	$\nu$	$P$	$P_{\mathrm {sim} }$	$t$	$\nu$	$P$	$P_{\mathrm {sim} }$
1	15	20.8	7.9	15	23.0	3.8	−2.46	28	0.021	0.021	−2.46	25.0	0.021	0.017
2	10	20.6	9.0	20	22.1	0.9	−2.10	28	0.045	0.150	−1.57	9.9	0.149	0.144
3	10	19.4	1.4	20	21.6	17.1	−1.64	28	0.110	0.036	−2.22	24.5	0.036	0.042

Welch's t-test and Student's t-test gave identical results when the two samples have identical variances and sample sizes (Example 1). But note that if you sample data from populations with identical variances, the sample variances will differ, as will the results of the two t-tests. So with actual data, the two tests will almost always give somewhat different results.

For unequal variances, Student's t-test gave a low p-value when the smaller sample had a larger variance (Example 2) and a high p-value when the larger sample had a larger variance (Example 3). For unequal variances, Welch's t-test gave p-values close to simulated p-values.

Software implementations

Language/Program	Function	Notes
LibreOffice	`TTEST(Data1; Data2; Mode; Type)`	See [10]
MATLAB	`ttest2(data1, data2, 'Vartype', 'unequal')`	See [11]
Microsoft Excel pre 2010	`TTEST(array1, array2, tails, type)`	See [12]
Microsoft Excel 2010 and later	`T.TEST(array1, array2, tails, type)`	See [13]
Minitab	Access commands through menu: see [14]	[15]
SAS (Software)	default output from `proc ttest` (labeled "Satterthwaite")
Python	`scipy.stats.ttest_ind(a, b, equal_var=False)`	See [16]
R	`t.test(data1, data2, alternative="two.sided", var.equal=FALSE)`	See [17]
Haskell	`Statistics.Test.StudentT.welchTTest SamplesDiffer data1 data2`	See [18]
JMP	`Oneway( Y( YColumn), X( XColumn), Unequal Variances( 1 ) );`	See [19]
Julia	`UnequalVarianceTTest(data1, data2)`	See [20]
Stata	`ttest varname1 == varname2, welch`	See [21]
Google Sheets	`TTEST(range1, range2, tails, type)`	See [22]
GraphPad Prism	It is a choice on the t test dialog.
IBM SPSS Statistics	An option in the menu	.[23] Cf. [24]
GNU Octave	`welch_test(x, y)`	See [25]

gollark: Sound would still propagate through the ground.

gollark: The angle at the top of the small (W_y/W/W_x) triangle is also 30 degrees, probably.

gollark: It's an irrelevant semantic issue.

gollark: It doesn't really matter.

gollark: Oh, neat.

References

Welch, B. L. (1947). "The generalization of "Student's" problem when several different population variances are involved". Biometrika. 34 (1–2): 28–35. doi:10.1093/biomet/34.1-2.28. MR 0019277. PMID 20287819.
Ruxton, G. D. (2006). "The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test". Behavioral Ecology. 17 (4): 688–690. doi:10.1093/beheco/ark016.
Derrick, B; Toher, D; White, P (2016). "Why Welchs test is Type I error robust" (PDF). The Quantitative Methods for Psychology. 12 (1): 30–38. doi:10.20982/tqmp.12.1.p030.
The Satterthwaite Formula for Degrees of Freedom in the Two-Sample t-Test (page 7)
Yates, Moore, and Starnes, The Practice of Statistics, 3rd ed., p. 792. Copyright 2008 by W.H. Freeman and Company, 41 Madison Avenue, New York, NY 10010
Welch, B. L. (1951). "On the Comparison of Several Mean Values: An Alternative Approach". Biometrika. 38 (3/4): 330–336. doi:10.2307/2332579. JSTOR 2332579.
Zimmerman, D. W. (2004). "A note on preliminary tests of equality of variances". British Journal of Mathematical and Statistical Psychology. 57: 173–181. doi:10.1348/000711004849222.
Fagerland, M. W. (2012). "t-tests, non-parametric tests, and large studies—a paradox of statistical practice?". BMC Medical Research Methodology. 12: 78. doi:10.1186/1471-2288-12-78. PMC 3445820. PMID 22697476.
Fagerland, M. W.; Sandvik, L. (2009). "Performance of five two-sample location tests for skewed distributions with unequal variances". Contemporary Clinical Trials. 30 (5): 490–496. doi:10.1016/j.cct.2009.06.007.
https://help.libreoffice.org/Calc/Statistical_Functions_Part_Five#TTEST
http://uk.mathworks.com/help/stats/ttest2.html
http://office.microsoft.com/en-us/excel-help/ttest-HP005209325.aspx
http://office.microsoft.com/en-us/excel-help/t-test-function-HA102753135.aspx
Example of 2-Sample t - Minitab: — official documentation for Minitab version 18. Accessed 2019-01-22.
Select the analysis options for 2-Sample t - Minitab: — official documentation for Minitab version 18. Accessed 2019-01-22.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html
http://hackage.haskell.org/package/statistics-0.15.0.0/docs/Statistics-Test-StudentT.html
https://www.jmp.com/support/help/
http://hypothesistestsjl.readthedocs.org/en/latest/index.html
http://www.stata.com/help.cgi?ttest
https://support.google.com/docs/answer/6055837?hl=en
Jeremy Miles <https://stats.stackexchange.com/users/17072/jeremy-miles>, Unequal variances t-test or U Mann-Whitney test?, URL (version: 2014-04-11): https://stats.stackexchange.com/q/93475
— Official documentation for SPSS Statistics version 24. Accessed 2019-01-22.
https://octave.sourceforge.io/statistics/function/welch_test.html

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Welch1947-1] Welch, B. L. (1947). "The generalization of "Student's" problem when several different population variances are involved". Biometrika. 34 (1–2): 28–35. doi:10.1093/biomet/34.1-2.28. MR 0019277. PMID 20287819.

[Ruxton2006-2] Ruxton, G. D. (2006). "The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test". Behavioral Ecology. 17 (4): 688–690. doi:10.1093/beheco/ark016.

[WhyWelch-3] Derrick, B; Toher, D; White, P (2016). "Why Welchs test is Type I error robust" (PDF). The Quantitative Methods for Psychology. 12 (1): 30–38. doi:10.20982/tqmp.12.1.p030.

[4] The Satterthwaite Formula for Degrees of Freedom in the Two-Sample t-Test (page 7)

[5] Yates, Moore, and Starnes, The Practice of Statistics, 3rd ed., p. 792. Copyright 2008 by W.H. Freeman and Company, 41 Madison Avenue, New York, NY 10010

[Welch1951-6] Welch, B. L. (1951). "On the Comparison of Several Mean Values: An Alternative Approach". Biometrika. 38 (3/4): 330–336. doi:10.2307/2332579. JSTOR 2332579.

[Zimmerman2004-7] Zimmerman, D. W. (2004). "A note on preliminary tests of equality of variances". British Journal of Mathematical and Statistical Psychology. 57: 173–181. doi:10.1348/000711004849222.

[Fagerland2012-8] Fagerland, M. W. (2012). "t-tests, non-parametric tests, and large studies—a paradox of statistical practice?". BMC Medical Research Methodology. 12: 78. doi:10.1186/1471-2288-12-78. PMC 3445820. PMID 22697476.

[Fagerland2009-9] Fagerland, M. W.; Sandvik, L. (2009). "Performance of five two-sample location tests for skewed distributions with unequal variances". Contemporary Clinical Trials. 30 (5): 490–496. doi:10.1016/j.cct.2009.06.007.

[10] ttps://help.libreoffice.org/Calc/Statistical_Functions_Part_Five#TTEST

[11] ttp://uk.mathworks.com/help/stats/ttest2.html

[12] ttp://office.microsoft.com/en-us/excel-help/ttest-HP005209325.aspx

[13] ttp://office.microsoft.com/en-us/excel-help/t-test-function-HA102753135.aspx

[14] Example of 2-Sample t - Minitab: — official documentation for Minitab version 18. Accessed 2019-01-22.

[15] Select the analysis options for 2-Sample t - Minitab: — official documentation for Minitab version 18. Accessed 2019-01-22.

[16] ttp://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

[17] ttps://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html

[18] ttp://hackage.haskell.org/package/statistics-0.15.0.0/docs/Statistics-Test-StudentT.html

[19] ttps://www.jmp.com/support/help/

[20] ttp://hypothesistestsjl.readthedocs.org/en/latest/index.html

[21] ttp://www.stata.com/help.cgi?ttest

[22] ttps://support.google.com/docs/answer/6055837?hl=en

[23] Jeremy Miles <https://stats.stackexchange.com/users/17072/jeremy-miles>, Unequal variances t-test or U Mann-Whitney test?, URL (version: 2014-04-11): https://stats.stackexchange.com/q/93475

[24] — Official documentation for SPSS Statistics version 24. Accessed 2019-01-22.

[25] ttps://octave.sourceforge.io/statistics/function/welch_test.html