Wilcoxon signed-rank test

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e. it is a paired difference test). It can be used as an alternative to the paired Student's t-test (also known as "t-test for matched pairs" or "t-test for dependent samples") when the distribution of the difference between two samples' means cannot be assumed to be normally distributed.[1] A Wilcoxon signed-rank test is a nonparametric test that can be used to determine whether two dependent samples were selected from populations having the same distribution.

History

The test is named for Frank Wilcoxon (1892–1965) who, in a single paper, proposed both it and the rank-sum test for two independent samples (Wilcoxon, 1945).[2] The test was popularized by Sidney Siegel (1956) in his influential textbook on non-parametric statistics.[3] Siegel used the symbol T for a value related to, but not the same as, $W$ . In consequence, the test is sometimes referred to as the Wilcoxon T test, and the test statistic is reported as a value of T.

Assumptions

Data are paired and come from the same population.
Each pair is chosen randomly and independently.
The data are measured on at least an interval scale when, as is usual, within-pair differences are calculated to perform the test (though it does suffice that within-pair comparisons are on an ordinal scale).

Test procedure

Let $N$ be the sample size, i.e., the number of pairs. Thus, there are a total of 2N data points. For pairs $i=1,...,N$ , let $x_{1,i}$ and $x_{2,i}$ denote the measurements.

H₀: difference between the pairs follows a symmetric distribution around zero

H₁: difference between the pairs does not follow a symmetric distribution around zero.

For $i=1,...,N$ , calculate $|x_{2,i}-x_{1,i}|$ and $\operatorname {sgn}(x_{2,i}-x_{1,i})$ , where $\operatorname {sgn}$ is the sign function.
Exclude pairs with $|x_{2,i}-x_{1,i}|=0$ . Let $N_{r}$ be the reduced sample size.
Order the remaining $N_{r}$ pairs from smallest absolute difference to largest absolute difference, $|x_{2,i}-x_{1,i}|$ .
Rank the pairs, starting with the pair with the smallest non-zero absolute difference as 1. Ties receive a rank equal to the average of the ranks they span. Let $R_{i}$ denote the rank.
Calculate the test statistic $W$
$W=\sum _{i=1}^{N_{r}}[\operatorname {sgn}(x_{2,i}-x_{1,i})\cdot R_{i}]$ , the sum of the signed ranks.
Under null hypothesis, $W$ follows a specific distribution with no simple expression. This distribution has an expected value of 0 and a variance of ${\frac {N_{r}(N_{r}+1)(2N_{r}+1)}{6}}$ .
$W$ can be compared to a critical value from a reference table.[4]

The two-sided test consists in rejecting $H_{0}$ if $|W|>W_{critical,N_{r}}$ .
As $N_{r}$ increases, the sampling distribution of $W$ converges to a normal distribution. Thus,
For $N_{r}\geq 20$ , a z-score can be calculated as $z={\frac {W}{\sigma _{W}}}$ , where $\sigma _{W}={\sqrt {\frac {N_{r}(N_{r}+1)(2N_{r}+1)}{6}}}$ .

To perform a two-sided test, reject $H_{0}$ if $z_{critical}<|z|$ .

Alternatively, one-sided tests can be performed with either the exact or the approximate distribution. p-values can also be calculated.
For $N_{r}<20$ the exact distribution needs to be used.

Example

$i$	$x_{2,i}$	$x_{1,i}$	$x_{2,i}-x_{1,i}$
$i$	$x_{2,i}$	$x_{1,i}$	$\operatorname {sgn}$	${\text{abs}}$
1	125	110	1	15
2	115	122	–1	7
3	130	125	1	5
4	140	120	1	20
5	140	140		0
6	115	124	–1	9
7	140	123	1	17
8	125	137	–1	12
9	140	135	1	5
10	135	145	–1	10

order by absolute difference

$i$	$x_{2,i}$	$x_{1,i}$	$x_{2,i}-x_{1,i}$
$i$	$x_{2,i}$	$x_{1,i}$	$\operatorname {sgn}$	${\text{abs}}$	$R_{i}$	$\operatorname {sgn} \cdot R_{i}$
5	140	140		0
3	130	125	1	5	1.5	1.5
9	140	135	1	5	1.5	1.5
2	115	122	–1	7	3	–3
6	115	124	–1	9	4	–4
10	135	145	–1	10	5	–5
8	125	137	–1	12	6	–6
1	125	110	1	15	7	7
7	140	123	1	17	8	8
4	140	120	1	20	9	9

$\operatorname {sgn}$ is the sign function, ${\text{abs}}$ is the absolute value, and $R_{i}$ is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.

W=1.5+1.5-3-4-5-6+7+8+9=9

|W|<W_{crit(\alpha =0.05,\ 9{\text{, two-sided}})}=5

[5]

\therefore {\text{failed to reject }}H_{0}

that the two medians are the same.

The

p

-value for this result is

0.6113

Historical T statistic

In historical sources a different statistic, denoted by Siegel as the T statistic, was used. The T statistic is the smaller of the two sums of ranks of given sign; in the example, therefore, T would equal 3+4+5+6=18. Low values of T are required for significance. T is easier to calculate by hand than W and the test is equivalent to the two-sided test described above; however, the distribution of the statistic under $H_{0}$ has to be adjusted.

T>T_{crit(\alpha =0.05,\ 9{\text{, two-sided}})}=5

\therefore {\text{failed to reject }}H_{0}

that the two medians are the same.

Note: Critical T values ( $T_{crit}$ ) by values of $N_{r}$ can be found in appendices of statistics textbooks, for example in Table B-3 of Nonparametric Statistics: A Step-by-Step Approach, 2nd Edition by Dale I. Foreman and Gregory W. Corder (https://www.oreilly.com/library/view/nonparametric-statistics-a/9781118840429/bapp02.xhtml).

Limitation

As demonstrated in the example, when the difference between the groups is zero, the observations are discarded. This is of particular concern if the samples are taken from a discrete distribution. In these scenarios the modification to the Wilcoxon test by Pratt 1959, provides an alternative which incorporates the zero differences.[6][7] This modification is more robust for data on an ordinal scale.[7]

Effect size

To compute an effect size for the signed-rank test, one can use the rank-biserial correlation.

If the test statistic W is reported, the rank correlation r is equal to the test statistic W divided by the total rank sum S, or r = W/S. [8] Using the above example, the test statistic is W = 9. The sample size of 9 has a total rank sum of S = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, so r = 0.20.

If the test statistic T is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.[8] To continue with the current example, the sample size is 9, so the total rank sum is 45. T is the smaller of the two rank sums, so T is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sum S minus T, or in this case 45 - 18 = 27. Next, the two rank-sum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hence r = .20.

Software implementations

R includes an implementation of the test as wilcox.test(x,y, paired=TRUE), where x and y are vectors of equal length.[9]
ALGLIB includes implementation of the Wilcoxon signed-rank test in C++, C#, Delphi, Visual Basic, etc.
GNU Octave implements various one-tailed and two-tailed versions of the test in the wilcoxon_test function.
SciPy includes an implementation of the Wilcoxon signed-rank test in Python
Accord.NET includes an implementation of the Wilcoxon signed-rank test in C# for .NET applications
MATLAB implements this test using "Wilcoxon rank sum test" as [p,h] = signrank(x,y) also returns a logical value indicating the test decision. The result h = 1 indicates a rejection of the null hypothesis, and h = 0 indicates a failure to reject the null hypothesis at the 5% significance level
Julia HypothesisTests package includes the Wilcoxon signed-rank test as "value(SignedRankTest(x, y))"

gollark: ßpoilers!

gollark: Hm·

gollark: Is it Objective C?

gollark: Wait, ObjectiveC?

gollark: > some people are very puritanical about oo being objects all the way downJS == object oriented confirmed?

References

"Paired t–test - Handbook of Biological Statistics". www.biostathandbook.com. Retrieved 2019-11-18.
Wilcoxon, Frank (Dec 1945). "Individual comparisons by ranking methods" (PDF). Biometrics Bulletin. 1 (6): 80–83. doi:10.2307/3001968. hdl:10338.dmlcz/135688. JSTOR 3001968.
Siegel, Sidney (1956). Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill. pp. 75–83.
Lowry, Richard. "Concepts & Applications of Inferential Statistics". Retrieved 5 November 2018.
"Wilcoxon Signed-Ranks Table | Real Statistics Using Excel". Retrieved 2020-08-10.
Pratt, J (1959). "Remarks on zeros and ties in the Wilcoxon signed rank procedures". Journal of the American Statistical Association. 54 (287): 655–667. doi:10.1080/01621459.1959.10501526.
Derrick, B; White, P (2017). "Comparing Two Samples from an Individual Likert Question". International Journal of Mathematics and Statistics. 18 (3): 1–13.
Kerby, Dave S. (2014), "The simple difference formula: An approach to teaching nonparametric correlation.", Comprehensive Psychology, 3: 11.IT.3.1, doi:10.2466/11.IT.3.1
Dalgaard, Peter (2008). Introductory Statistics with R. Springer Science & Business Media. pp. 99–100. ISBN 978-0-387-79053-4.

External links

Wilcoxon Signed-Rank Test in R
Example of using the Wilcoxon signed-rank test
An online version of the test
A table of critical values for the Wilcoxon signed-rank test
Brief guide by experimental psychologist Karl L. Weunsch - Nonparametric effect size estimators (Copyright 2015 by Karl L. Weunsch)
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, volume 3, article 1. doi:10.2466/11.IT.3.1. link to article

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Paired t–test - Handbook of Biological Statistics". www.biostathandbook.com. Retrieved 2019-11-18.

[2] Wilcoxon, Frank (Dec 1945). "Individual comparisons by ranking methods" (PDF). Biometrics Bulletin. 1 (6): 80–83. doi:10.2307/3001968. hdl:10338.dmlcz/135688. JSTOR 3001968.

[3] Siegel, Sidney (1956). Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill. pp. 75–83.

[lowry-4] Lowry, Richard. "Concepts & Applications of Inferential Statistics". Retrieved 5 November 2018.

[5] "Wilcoxon Signed-Ranks Table | Real Statistics Using Excel". Retrieved 2020-08-10.

[Pratt-6] Pratt, J (1959). "Remarks on zeros and ties in the Wilcoxon signed rank procedures". Journal of the American Statistical Association. 54 (287): 655–667. doi:10.1080/01621459.1959.10501526.

[IndivLikert-7] Derrick, B; White, P (2017). "Comparing Two Samples from an Individual Likert Question". International Journal of Mathematics and Statistics. 18 (3): 1–13.

[Kerby2014-8] Kerby, Dave S. (2014), "The simple difference formula: An approach to teaching nonparametric correlation.", Comprehensive Psychology, 3: 11.IT.3.1, doi:10.2466/11.IT.3.1

[9] Dalgaard, Peter (2008). Introductory Statistics with R. Springer Science & Business Media. pp. 99–100. ISBN 978-0-387-79053-4.