Simpson's paradox
Simpson's paradox, or the Yule-Simpson effect, is a phenomenon noted in statistics, which states:
“”The ratio of success of several sub-populations can be reversed in the ratio of success of the population as a whole. |
Part of a convergent series on Mathematics |
1+1=11 |
v - t - e |
This effect has been noticed to occur in Bernoulli Trials, especially in social science, medicine and baseball batting averages.
It is not this statement by Homer Simpson:
“”To alcohol! The cause of, and solution to, all of life's problems. |
Example
The paradox can be demonstrated with the following example:
University of California, Berkeley Admissions and Gender Bias
According to a 1975 study by Bickel, Hammel and O'Connell[1], in the fall of 1973, 12,763 people applied for grad school at the University of California, Berkeley (UC Berkeley). 8,442 of the applicants were male, and 4,321 were female. 44% of the males were successful, but only 35% of the females were admitted.
By only considering these statistics, one could conclude that there was drastically less female applicants. However, when segregating the data by department, a different conclusion becomes visible. The following tables represent application and admission data for the 6 most popular departments at UC Berkeley:[note 1][note 2]
Total number of applicants per department, sorted by sex
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
Male | 825 | 560 | 325 | 417 | 191 | 373 |
Female | 108 | 25 | 593 | 375 | 393 | 341 |
Total number of successful applicants (percentage of all applicants in brackets)
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
Male | 512 (62%) | 353 (63%) | 120 (37%) | 138 (33%) | 53 (28%) | 22 (6%) |
Female | 89 (82%) | 17 (68%) | 202 (34%) | 131 (35%) | 94 (24%) | 24 (7%) |
The data above demonstrates that there is a much smaller difference in admission rates for the most popular departments, many cases (A, B, D, and F) which also show a bias toward female applicants.
The data also demonstrates that there were 2,691 males and 1,835 females who applied to these apartments. When dividing these numbers by the total number of applicants in each sex (8,442 male and 4,321 female), 32% of males applied to these six departments, while 42% of females applied to the same set of departments. Bickel et al. explained that the aggregate statistic failed to account for the characteristics of departments that female applicants applied to: highly competitive with low admission rates (for both sexes). This is the essence of Simpson's paradox: though aggregate data may suggest one thing, properly pooled data can prove a different and more accurate depiction of a given set of data.
Though not directly related to Simpson's paradox, the study made several other conclusions which were more sociological than statistical. After accounting for other factors such as departmental bias and female preferences for department choice, female applicants were found to be slightly more favored over male applicants. The study also suggested that female students have been socialized into their department preferences, having been encouraged to study fields which were crowded (thus more competitive), lower paying, and less funded.
Summary
To put it simply, Simpson's paradox demonstrates how data represented one way can lead to a certain set of conclusions, but represented in another, more complete way can lead to a different, often opposite set of conclusions. It also isn't really a paradox and the guy who first coined it wasn't even the first to suggest the phenomenon.
See also
Notes
- The data is widely accessible through the UCBAdmissions R dataset.
- For confidentiality reasons, the university did not release the departments' names for Bickel's paper or the R dataset.
References
- Bickel, Hammel and O’Connell, Sex Bias in Graduate Admissions: Data from Berkley Science, Vol. 187, No. 4175, pp. 398-404. https://homepage.stat.uiowa.edu/~mbognar/1030/Bickel-Berkeley.pdf