27

This answer mentions Bayesian poisoning in passing and I've read the wikipedia page but don't feel I've fully grasped it.

The first case, where a spammer sends a spam with a payload (link, malicious file, etc) and includes lots of non-spammy "safe" words seems obvious enough. The aim is to bring up the rating of that individual email so that spam filters might class it as "not spam".

The second case is more subtle and (to me) confusing:

Spammers also hope to cause the spam filter to have a higher false positive rate by turning previously innocent words into spammy words in the Bayesian database (statistical type I errors) because a user who trains their spam filter on a poisoned message will be indicating to the filter that the words added by the spammer are a good indication of spam.

How does this help the spammer? Sure, false-positives (if I've understood correctly that this means legitimate emails wrongly classed as spam) are annoying, but they would have the be very common to disable spam filters entirely. It doesn't seem like this would change the rating of real spammy words, or does it just affect their relative rating?

Finally, does this, or any other, approach help an individual spammer with a particular few spam words they'd like to sneak through the filters, or would it potentially help all spammers?

Could someone provide or link to an example-based explanation?

James Bradbury
  • 2,017
  • 19
  • 27
  • 1
    This is too broad. What is it you do not understand? What is the context of your question? – GdD Nov 19 '14 at 10:15
  • Fair point, more detail added. – James Bradbury Nov 19 '14 at 10:42
  • 2
    Not long enough for an answer, but some filters use relative "spamminess" of words, so that good words becoming more "spammy" implicitly makes genuinely problematic ones have less of an impact. – Vality Nov 20 '14 at 03:01

3 Answers3

22

There's a good paper published named Bachelor thesis:The Effects of Different Bayesian Poison Methods on the Quality of the Bayesian Spam Filter ‘SpamBayes’ by Martijn Sprengers.

I'll try to make TL;DR:

Bayesian spamfilters try to decide if an email is spam or not by looking at keywords in an email. What it does is review the words present in normal and spam email and update the scores for each word. These scores are used to deduce if an email is spam or not by making a score based on the overal score of words present in the email.

Words are re-scored, meaning that if "Viagra" appears in several normal emails, it will get a lower score over time. This is abused by spammers by generating email with several low scoring words, commonly found in legitimate emails and adding a single bad word. Because the score of the email will overall be considered good "Viagra" will get a lower score over time making it a legitimate word, and causing spam email to pass through spam filters.

There are three attacks the paper discusses:

Random Words: This attack method is based on the research by Gregory et al. [6]. It can be seen as a weak statistical attack, because it uses purely randomized data to add to the spam e-mails.

Common Words: This attack method is based on the research by Stern et al. [7]. They added common English words to spam e-mails in order to confuse the spam filter. This attack can be seen as stronger statistical attack than the Random Words method, because the data used is less random and it contains words that are more likely to be in e-mails than the words added with the previous attack.

Ham Phrases: This attack is developed in this research and tested against the other two. It is based on a huge collection of ham e-mails. From that collection, only the ham e-mails with the lowest combined probability are used as poison. The ham e-mail is then added at the end of the original spam e-mail. Most people read downwards, so the effectiveness of the message is maintained. This is also a strong statistical attack, maybe even stronger than the Common Words attack, because the words are even less randomized.

Highlights from the paper's conclusion:

From a spammer’s point of view, the ‘HamPhrases’ technique seems to work best. It does decrease the performance of the spam filter. … The ‘Random’ and ‘Common Words’ techniques seem to score worse from a spammers point of view. … When we train the spam filter on those poison methods, the performance gets even better than normal. …

However, the HamPhrases method used in this research is a little bit cheating. This is because both ham and spam e-mails that the spam filter uses for testing and training are available for the algorithm. Real spammers do not have the ham e-mails of real users.

Adam Katz
  • 9,718
  • 2
  • 22
  • 44
Lucas Kauffman
  • 54,169
  • 17
  • 112
  • 196
  • 2
    This is a good answer to the question *title* ("How does Bayesian poisoning work?"), but not a good answer to the question *body*, which asks about why a spammer would want to increase a spam-filter's false *positive* rate. That's not your fault -- the question body didn't clarify that at the time that you posted your answer -- but it makes the answer much less useful, since the question body is now a very interesting one and the only answer here is one that doesn't address what makes it interesting. Do you think you can add something to cover that? – ruakh Nov 19 '14 at 17:22
  • Spammers are interested in increasing false positive rates because the standard reaction to that is to raise thresholds, which will lower the spam catch rate. – Adam Katz Mar 26 '15 at 23:01
  • I just skimmed the paper. I'll add the paper's conclusion to the answer above but reserve my criticisms for this comment. In order to control the experiment, **Sprengers disabled header tokenization, which is known to severely handicap Bayes** (Graham's usage of Bayes wasn't the first; previous attempts had ignored headers and concluded that Bayes wasn't useful for anti-spam). With just the body, poisoning should be significantly easier. (This is, in part, why spam from webmail providers is so much harder to catch.) – Adam Katz Mar 27 '15 at 00:18
17

Lucas Kauffman answer explain the how very well, as for why:

If the user is failing to receive important emails and it turns out they got caught in the spam filter then they'll get angry at their admin. False positives can have a very high cost.

When a lot of users get angry at the admin the admin is likely to change things so that the spam filter is more forgiving which is likely to end up letting more spam through which is good for the spammers.

Murphy
  • 2,175
  • 1
  • 9
  • 10
  • 2
    +1. In other words, increasing the rate of false positives can have the ultimate effect of increasing the rate of false negatives, via human factors. It's like social engineering, only without having to do social engineering! – ruakh Nov 19 '14 at 23:30
4

I have a great example of a spam message with Bayesian poisoning in an old blog post.

Bayesian spam filters basically keep track of each word used in each message. When a message is marked as spam, the filter treats the words in the message as representative of spam. By using this information, the filter can determine with good accuracy whether a particular message is spam or not.

However, the fact that Bayesian filters use the words in each message to determine whether a message is spam makes it susceptible to techniques that circumvent this process.

A spam message can insert nonsense words, break the words apart in a human-readable (but not machine-readable) fashion (e.g. insert "invisible" small letters between each letter in the spammy word), use accent marks or HTML entities to make it harder to distinguish by filters, or use HTML forms in place of links. This is essentially what Bayesian poisoning is, and all of these techniques are demonstrated and explained in my blog post.

In particular, the "nonsense words" can be carefully chosen to be those commonly found in normal messages. A user marking a spam message containing these words as spam is essentially telling the filter to treat them as an indication of spam. With enough such messages, the filter will think that these words represent spam and begin to mark legitimate messages containing these words as such.

The first image in the blog post demonstrates how this is done:

Spam message in Firefox page inspector: "Nonsense" words
View full size

Although the full sentences don't make a lot of sense, they look somewhat coherent. "Smiling at that", "God knew he waited", and "Behind the bed" are all phrases and words which can appear in normal messages. If these kinds of phrases appear often enough in spam messages and the user marks them as spam, the filter could end up thinking legitimate messages with these phrases are spam.

bwDraco
  • 473
  • 2
  • 10