Separation (statistics)

In statistics, separation is a phenomenon associated with models for dichotomous or categorical outcomes, including logistic and probit regression. Separation occurs if the predictor (or a linear combination of some subset of the predictors) is associated with only one outcome value when the predictor is greater than some constant.

For example, if the predictor X is continuous, and the outcome y = 1 for all observed x > 2. If the outcome values are perfectly determined by the predictor (e.g., y = 0 when x  2) then the condition "complete separation" is said to occur. If instead there is some overlap (e.g., y = 0 when x < 2, but y has observed values of 0 and 1 when x = 2) then "quasi-complete separation" occurs. A 2 × 2 table with an empty cell is an example of quasi-complete separation.

This observed form of the data is important because it causes problems with estimated regression coefficients. Loosely, a parameter in the model "wants" to be infinite, if complete separation is observed.[1] If quasi-complete separation is the case, the likelihood is maximized at a very large but not infinite value for that parameter.[2] Computer programs will often output an arbitrarily large parameter estimate with a very large standard error.[3] Methods to fit these models include exact logistic regression and Firth logistic regression, a bias-reduction method based on a penalized likelihood.[4]

References

  1. Zeng, Guoping; Zeng, Emily (2019). "On the Relationship between Multicollinearity and Separation in Logistic Regression". Communications in Statistics. Simulation and Computation. doi:10.1080/03610918.2019.1589511.
  2. Albert, A.; Anderson, J. A. (1984). "On the Existence of Maximum Likelihood Estimates in Logistic Regression Models". Biometrika. 71 (1–10). doi:10.1093/biomet/71.1.1.
  3. McCullough, B. D.; Vinod, H. D. (2003). "Verifying the Solution from a Nonlinear Solver: A Case Study". American Economic Review. 93 (3): 873–892. JSTOR 3132121.
  4. Mansournia, Mohammad Ali; Geroldinger, Angelika; Greenland, Sander; Heinze, Georg (2018). "Separation in Logistic Regression: Causes, Consequences, and Control". American Journal of Epidemiology. 187 (4): 864–870. doi:10.1093/aje/kwx299.

Further reading

  • Davidson, Russell; MacKinnon, James G. (2004). Econometric Theory and Methods. New York: Oxford University Press. pp. 458–459. ISBN 978-0-19-512372-2.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.