Regression to the mean

Regression to the mean is a technical way of saying that things tend to even out over time. The sprinter that breaks the world record will probably run closer to their average time on the next race; or the medical treatment that achieves stunning results on the first trial will probably not be as efficacious on the second. Specifically, it refers to the tendency of a random variable that is highly distinct from the norm to return to "normal" over repeated tests. On average, observations tend to cluster around the mean (forming a normal distribution),[note 1] whether or not they follow a really unusual value. It only becomes most obvious when a strange result (e.g. a hole-in-one in golf) is followed by something much more ordinary (like a double-bogey). Regression to the mean forms the basis for the Central Limit Theorem (CLT), which allows statisticians to do calculations on samples that are very large even if the sample isn't known to have a normal distribution.

Cogito ergo sum
Logic and rhetoric
Key articles
General logic
Bad logic
v - t - e

In medicine

See the main article on this topic: Spontaneous remission

Unfortunately, many of the effects claimed by alternative medicine can often be explained simply as regression to the mean. When Aunt Jane's acne gets better after rubbing mint leaves on her face, that's "anecdotal evidence" based almost entirely on regression to the mean. Many symptoms will come and go in an apparently random fashion if recorded in an objective way headaches, for example, tend to disappear without the aid of any treatment over time. People seek treatment when their symptoms are particularly severe, when they are at their respective "top". Regression to the mean, therefore, suggests that if symptoms are excessively severe this week, then next week they should be less severe simply by random fluctuations. If treatment is only sought when these symptoms are at their worst there will almost always be a coincidental recovery. This appears even if the treatment has no effectiveness whatsoever.

A placebo control group in controlled trials removes the effect of regression to the mean. Both groups, on average, experience a tendency to regress to the mean. If the treatment group shows a statistically significant increase in the speed that symptoms regress, it can be attributed to the effects of the treatment with some certainty, not the placebo effect or regression to the mean.

Other examples

Testing

For example, if a researcher gave a large group of people a test and selected the top-performing 5%, these people would be likely to score worse, on average, if re-tested. Similarly, the bottom 5% would be likely to score better on a retest.[note 2] In either case, the extremes of the distribution are likely to "regress to the mean" due to simple luck and natural random variation in the results.

Sports

One way of thinking about "regression to the mean" is in terms of sports performance. In order to win a football championship, for example, it is not enough only to be a good team one needs to be both good and lucky. The team at the top of the standings in mid-season is likely to have been both good and lucky to that point, but cannot count on still being lucky for the rest of the season. For this reason, the team that is at the top of the standings at midseason is more likely to drop in standings than to remain at the top, and more likely to remain at the top than to improve (how does one improve from "the top," anyway?).

This observation has been tagged the "Sports Illustrated Jinx". The jinx states that a player or team featured on the cover of a sports magazine such as SI is likely to have a disappointing year the following season (or even a disappointing game the following week). But if you think about it, a player is only likely to make the cover once, and for some surprisingly good performance something truly spectacular that requires not only their superlative skill, but also lots of luck to beat the superlative skill of their competitors. Athletes on the cover of Sports Illustrated are likely to be at the very top of their game, and at the top, the most likely direction to move next is down. The next year, although the player may still be as skilled, they will not be as lucky, and post scores closer to "typical".

Traffic cameras

A good example of how regression to the mean can seem to prove the effectiveness of almost any intervention is that of the installation of road traffic safety measures say speed cameras, a very common device in the UK. A flukey cluster of motor vehicle collisions one year seems to show that a stretch of road is becoming more dangerous, and it is suggested that a speed camera be installed. In the year after the camera is installed, the number of collisions is roughly average again.

gollark: It is not you.
gollark: There's another person at school who keeps saying it too.
gollark: You have repeatedly said this. It's been quite annoying.
gollark: ↑
gollark: Wrong answer.

See also

Notes

  1. Do the math using calculus, expected values and other headache-inducing tools of statisticians. Note that with skewed (non-normal) distributions, the observations will tend to fall around the median, as the mean is skewed by outliers.
  2. The effect would be much more pronounced if test-takers answered randomly. If you have a calculator on hand and want to see how well you'd probably do on a multiple choice test with four possible answers, grab a graphing calculator and do the following (we're assuming you have a TI-84 calculator):
    • Press the 2nd key and press the vars key
    • Press the alpha key and press the math key
    • Put in say.. 25 for the trials. This represents 25 questions on the test
    • Put in a p-value of .25. Because we have a four-choice test (A, B, C, and D) and you're answering randomly, you have a 1/4 or a .25 chance of getting the answer right.
    • Put in 25 for the X value. This will calculate the chance of you getting all the questions right. Changing the X value will change how many questions you get right. If I were to put 3 as the X value, I'm calculating the probability that I get three questions right.
    Now that you've done this, it's time to cry that if your life depended on you getting all 25 right, your chances of doing so are 8.8817824E-16, or approximately zero.
This article is issued from Rationalwiki. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.