Is this some kind of Bayesian poisoning?

Question

So I've been reading my spam lately, and today I received this:

To: my actual email, but with a name found here: tel2name.com (only hit on Google)

Subject: one seemingly random excerpt of a web page found here: (only hit on Google) http://firstcommercialcreditcorp.com/commercial-property-values-increase/

citing "RCA" which in this case, refers to: rca nalytics (dot com)

The body itself is an excerpt from this page: http://www.gutenberg.org/files/11226/11226-h/11226-h.htm, one sentence talking about: Gadsden Purchase, filibustering, and Washington.

What may be the goal of this ? I don't understand what profit they can gain from sending this, because it does not look like Bayesian poisoning ?

Also, the body is: Content-Transfer-Encoding: base64

Adam Katz · Accepted Answer · 2015-03-28T06:16:18.900

Without more information (consider pasting the full email to a pastebin or gist), I can't conclusively tell you what that message is, but from your description, yes, that spam is likely an attempt at poisoning.

When you see lots of gibberish or quotes that are nonsensical or out of context, you are seeing either a hash buster or Bayesian poisoning.

Fuzzy hashing systems like Razor measure various characteristics of messages and roll each measurement up into a small string ("fuzzy hash") that can then be compared to fuzzy hashes of known spam. Matches are assumed to be in the same spam campaign.

Hash busters try to change enough content to prevent a spam campaign's fuzzy hashes from matching. This technique can work, but some fuzzy hash algorithms are specifically designed to be robust to it, ignoring large swaths of message bodies and even using hashes composed of heuristics indicative of hash busters.

Bayesian spam detection calculates probabilities of spam versus legitimate mail ("ham") based on observed frequencies of each word in ham and spam (e.g. "v1agra" is very spammy but "Niagara" is very hammy). These are then combined into a spamminess probability for the entire message, which is deemed to be spam at a certain threshold.

Bayesian poisoning tries to add content that is plausible to appear in ham. The theory is that the spamminess probability would get lowered by the inclusion of a whole bunch of hammy words. Luckily, this is not actually the case. Because Bayes is constantly retrained on uncaught spam, it will learn that these distractions are irrelevant and it will place more emphasis on the remaining content seen only in spam. In most cases, it will actually learn that Bayesian poisoning is itself a sign of spam.

While Bayesian poisoning can be effective at hash busting, it is counter-productive at poisoning Bayesian filters; one of the best anti-spam tools for fighting both hash busters and Bayesian poisoning is: Bayesian spam detection.

Learn more about Bayesian poisoning

[This answer](https://security.stackexchange.com/questions/73181/how-does-bayesian-poisoning-work/73183#73183) to "How does Bayesian poisoning work" cites a research paper investigating poisoning techniques. The paper concludes that the only way to successfully poison Bayes is to paste the victim's past nonspam mail bodies into the spam, which is quite impractical. — Adam Katz, Mar 27 '15 at 00:53

Is this some kind of Bayesian poisoning?

1 Answers1

Linked