D'Agostino's K-squared test

In statistics, D’Agostino’s K² test, named for Ralph D'Agostino, is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population. The test is based on transformations of the sample kurtosis and skewness, and has power only against the alternatives that the distribution is skewed and/or kurtic.

Skewness and kurtosis

In the following, { x_i } denotes a sample of n observations, g₁ and g₂ are the sample skewness and kurtosis, m_j’s are the j-th sample central moments, and ${\bar {x}}$ is the sample mean. Frequently in the literature related to normality testing, the skewness and kurtosis are denoted as √β₁ and β₂ respectively. Such notation can be inconvenient since, for example, √β₁ can be a negative quantity.

The sample skewness and kurtosis are defined as

{\begin{aligned}&g_{1}={\frac {m_{3}}{m_{2}^{3/2}}}={\frac {{\frac {1}{n}}\sum _{i=1}^{n}\left(x_{i}-{\bar {x}}\right)^{3}}{\left({\frac {1}{n}}\sum _{i=1}^{n}\left(x_{i}-{\bar {x}}\right)^{2}\right)^{3/2}}}\ ,\\&g_{2}={\frac {m_{4}}{m_{2}^{2}}}-3={\frac {{\frac {1}{n}}\sum _{i=1}^{n}\left(x_{i}-{\bar {x}}\right)^{4}}{\left({\frac {1}{n}}\sum _{i=1}^{n}\left(x_{i}-{\bar {x}}\right)^{2}\right)^{2}}}-3\ .\end{aligned}}

These quantities consistently estimate the theoretical skewness and kurtosis of the distribution, respectively. Moreover, if the sample indeed comes from a normal population, then the exact finite sample distributions of the skewness and kurtosis can themselves be analysed in terms of their means μ₁, variances μ₂, skewnesses γ₁, and kurtoses γ₂. This has been done by Pearson (1931), who derived the following expressions:

{\begin{aligned}&\mu _{1}(g_{1})=0,\\&\mu _{2}(g_{1})={\frac {6(n-2)}{(n+1)(n+3)}},\\&\gamma _{1}(g_{1})\equiv {\frac {\mu _{3}(g_{1})}{\mu _{2}(g_{1})^{3/2}}}=0,\\&\gamma _{2}(g_{1})\equiv {\frac {\mu _{4}(g_{1})}{\mu _{2}(g_{1})^{2}}}-3={\frac {36(n-7)(n^{2}+2n-5)}{(n-2)(n+5)(n+7)(n+9)}}.\end{aligned}}

and

{\begin{aligned}&\mu _{1}(g_{2})=-{\frac {6}{n+1}},\\&\mu _{2}(g_{2})={\frac {24n(n-2)(n-3)}{(n+1)^{2}(n+3)(n+5)}},\\&\gamma _{1}(g_{2})\equiv {\frac {\mu _{3}(g_{2})}{\mu _{2}(g_{2})^{3/2}}}={\frac {6(n^{2}-5n+2)}{(n+7)(n+9)}}{\sqrt {\frac {6(n+3)(n+5)}{n(n-2)(n-3)}}},\\&\gamma _{2}(g_{2})\equiv {\frac {\mu _{4}(g_{2})}{\mu _{2}(g_{2})^{2}}}-3={\frac {36(15n^{6}-36n^{5}-628n^{4}+982n^{3}+5777n^{2}-6402n+900)}{n(n-3)(n-2)(n+7)(n+9)(n+11)(n+13)}}.\end{aligned}}

For example, a sample with size n = 1000 drawn from a normally distributed population can be expected to have a skewness of 0, SD 0.08 and a kurtosis of 0, SD 0.15, where SD indicates the standard deviation.

Transformed sample skewness and kurtosis

The sample skewness g₁ and kurtosis g₂ are both asymptotically normal. However, the rate of their convergence to the distribution limit is frustratingly slow, especially for g₂. For example even with n = 5000 observations the sample kurtosis g₂ has both the skewness and the kurtosis of approximately 0.3, which is not negligible. In order to remedy this situation, it has been suggested to transform the quantities g₁ and g₂ in a way that makes their distribution as close to standard normal as possible.

In particular, D’Agostino (1970) suggested the following transformation for sample skewness:

Z_{1}(g_{1})=\delta \operatorname {asinh} \left({\frac {g_{1}}{\alpha {\sqrt {\mu _{2}}}}}\right),

where constants α and δ are computed as

{\begin{aligned}&W^{2}={\sqrt {2\gamma _{2}+4}}-1,\\&\delta =1/{\sqrt {\ln W}},\\&\alpha ^{2}=2/(W^{2}-1),\end{aligned}}

and where μ₂ = μ₂(g₁) is the variance of g₁, and γ₂ = γ₂(g₁) is the kurtosis — the expressions given in the previous section.

Similarly, Anscombe & Glynn (1983) suggested a transformation for g₂, which works reasonably well for sample sizes of 20 or greater:

Z_{2}(g_{2})={\sqrt {\frac {9A}{2}}}\left\{1-{\frac {2}{9A}}-\left({\frac {1-2/A}{1+{\frac {g_{2}-\mu _{1}}{\sqrt {\mu _{2}}}}{\sqrt {2/(A-4)}}}}\right)^{\!1/3}\right\},

where

A=6+{\frac {8}{\gamma _{1}}}\left({\frac {2}{\gamma _{1}}}+{\sqrt {1+4/\gamma _{1}^{2}}}\right),

and μ₁ = μ₁(g₂), μ₂ = μ₂(g₂), γ₁ = γ₁(g₂) are the quantities computed by Pearson.

Omnibus K² statistic

Statistics Z₁ and Z₂ can be combined to produce an omnibus test, able to detect deviations from normality due to either skewness or kurtosis (D’Agostino, Belanger & D’Agostino 1990):

K^{2}=Z_{1}(g_{1})^{2}+Z_{2}(g_{2})^{2}\,

If the null hypothesis of normality is true, then K² is approximately χ²-distributed with 2 degrees of freedom.

Note that the statistics g₁, g₂ are not independent, only uncorrelated. Therefore, their transforms Z₁, Z₂ will be dependent also (Shenton & Bowman 1977), rendering the validity of χ² approximation questionable. Simulations show that under the null hypothesis the K² test statistic is characterized by

	expected value	standard deviation	95% quantile
n = 20	1.971	2.339	6.373
n = 50	2.017	2.308	6.339
n = 100	2.026	2.267	6.271
n = 250	2.012	2.174	6.129
n = 500	2.009	2.113	6.063
n = 1000	2.000	2.062	6.038
χ²(2) distribution	2.000	2.000	5.991

gollark: If you're using camelCase, then `eraseHardDrive`, I guess.

gollark: `erase-hard-drive`

gollark: `readchar`User types `w``'w'` now on stack or whatever.

gollark: IT WOULD READ A KEYPRESS.

gollark: IF YOU IMPLEMENTED IT THAT WAY, YES.

References

Anscombe, F.J.; Glynn, William J. (1983). "Distribution of the kurtosis statistic b₂ for normal statistics". Biometrika. 70 (1): 227–234. doi:10.1093/biomet/70.1.227. JSTOR 2335960.
D’Agostino, Ralph B. (1970). "Transformation to normality of the null distribution of g₁". Biometrika. 57 (3): 679–681. doi:10.1093/biomet/57.3.679. JSTOR 2334794.
D’Agostino, Ralph B.; Albert Belanger; Ralph B. D’Agostino, Jr (1990). "A suggestion for using powerful and informative tests of normality" (PDF). The American Statistician. 44 (4): 316–321. doi:10.2307/2684359. JSTOR 2684359. Archived from the original (PDF) on 2012-03-25.
Pearson, Egon S. (1931). "Note on tests for normality". Biometrika. 22 (3/4): 423–424. doi:10.1093/biomet/22.3-4.423. JSTOR 2332104.
Shenton, L.R.; Bowman, K.O. (1977). "A bivariate model for the distribution of √b₁ and b₂". Journal of the American Statistical Association. 72 (357): 206–211. doi:10.1080/01621459.1977.10479940. JSTOR 2286939.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

D'Agostino's K-squared test

Skewness and kurtosis

Transformed sample skewness and kurtosis

Omnibus K2 statistic

See also

References

Omnibus K² statistic