Bonferroni's principle

Bonferroni's Principle is an informal presentation of a statistical theorem that states if your method of finding significant items returns significantly more items that you would expect in the actual population, you can assume most of the items you find with it are bogus.[1] This essentially means that an algorithm or method we think is useful for finding a particular set of data actually returns more false positives as it returns larger portions of the data than should be within that category.

Part of a
convergent series on

Mathematics
1+1=11
v - t - e

Informal example

Assume you are trying to identify people who are cheating on their spouses within a certain population, and you know that the percentage in the population who cheat on their spouses is 5%. If you decide that people who claim to go out with coworkers more than three times a month are most likely actually cheating on their spouses, but discover that 20% of people in the population qualify with your method, then you know in the very best case only one quarter of the people you identify will actually be cheaters. Furthermore, if there are any false negatives (cheaters who aren't identified as cheaters), an even higher percentage of the "cheaters" identified with the system would be false positives.

Uses

Applying Bonferroni's Principle to an algorithm or system for identifying or classifying data gives you an upper bound on the accuracy of your methods. If you determine that you match significantly more data or less data than you should expect than you in the best case have too many false positives or false negatives, respectively. This is not to say that the algorithm is correct in the case that it matches a number relatively close to what you would expect. You could happen to perfectly match the correct number of items in the data set but be matching the incorrect items. This is why it gives an estimation of the best case scenario.

The principle is especially useful in debunking individuals who use cold reading techniques. You may think they are using some sort of psychic power to accurately identify a single person, but if it turns out that 90% of the audience can identify with something they are saying and it's likely there's at least one person who can identify with most of what they are saying in every audience, their powers become much less impressive.

gollark: I mean, more like "without large-scale coordination mechanisms and specialization".
gollark: "do something which provides other people value or die", how awful.
gollark: (as a job)
gollark: You know, there are quite a lot of jobs. And you can do anything which people are willing to pay (enough) for.
gollark: Maybe some people are depressed because of, I don't know, deep feelings on society, but for some it's probably just some kind of random chemical imbalance (I do not know neuroscience).

References

  1. Rajaraman A, Leskovec J, Ullman J.Mining of Massive Datasets Version 1.3
This article is issued from Rationalwiki. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.