3

Are there anti-spam SMTP proxies(like ASSP, qpsmtpd) or e-mail spam filtering solutions in general(like SpamAssassin) out there which create for example block filters based on objects in RIR databases? If yes, then which objects(inetnum? route?) do they use and how?

Martin
  • 361
  • 2
  • 8
  • 16

2 Answers2

1

Not sure if helps, but I have used RBL lists for a long in my mail servers and they do a pretty decent job (they take care about 90% - 95% of the spam).

It works much better than SpamAssassin (and without hassle). Still, I use SpamAssassin for the rest of the spam.

Now, about your specific question: If I understand correctly, what you want is to block complete segments automatically, right?

I think Whois inetnum and route are the same, just different representations. Don't take my word as the holly truth, as I have no so much experience with those values, but that is what I believe.

I think this question and this one may give you some clue on how to achieve what you want.

While I see an advantage on blocking entire segments, I think you won't get so much benefits from that. The reason is because spam comes from many different segments and blocking one segment will only block very few positive IPs.

One problem I see, is that you may over-block IPs:

I will give an example in real life: Here in Japan the company NTT owns a mail service with the "ocn.ne.jp" domain. It has several smtp servers which are originated in the same segment (randomly assigned to their customers). Sometimes one of those servers gets black listed (RBL) and my servers block emails coming from there. Sometimes they are blocked for hours due to some misuse in some account. But that doesn't mean I should block all the segment. If I do that, it would be a big problem as its commonly used here in Japan.

OCN is not the only case, I have seen it too with yahoo servers and others.

lepe
  • 2,184
  • 2
  • 15
  • 29
1

Personally, I think you're better off keying on a few countries and languages that you are certain you do not want and then penalize them in SpamAssassin using RelayCountry, TextCat, etc.

 

I've experimented with RIR data in SpamAssassin rules in the past. My conclusion was that there was nothing terribly useful even as features within a machine learning environment.

The criteria are a little stale (I'm not updating the CIDRs that RIRs trade and this is just IPv4), but this should be roughly representative:

  S/O   Flow%  RIR
0.282  50.052  ARIN
0.785  26.186  RIPE
0.845  16.274  APNIC
0.129   9.983  Legacy Class A
0.915   1.348  LACNIC
0.763   0.744  AFRINIC

("S/O" is relative precision using a balanced sample of spam and legit mail. It very roughly correlates with spam probabilities. "Flow%" is the percent of all traffic flow that I saw in this sample period (and it includes some overlap). The lower the Flow%, the less you should trust the S/O – i.e. don't block Latin America.)

This obviously reflects my sampling, which is a very very small subset of the data I have available. As you can see, I have far more data fom North America and Europe than I do in Latin America or Africa. This does not necessarily reflect real life (or even my own data set; I randomly downsampled!).

If you know your communications channels very intimately, you can consider something like this, but it's way too broad to safely deploy.

Since my data is so stale, I'm not going to dump all of it here, but as an example, here's my definition of the Legacy Class A space (the ^ should remove the overlap issue experienced above):

header  __RCVD_VIA_LEGACY  X-Spam-Relays-External =~ /^\[ ip=(?:[689]|2(?:[025689]|1[45]?)|1[12356789]|3[023458]?|5[1234567]|4[0478]?)\b/

If I recall correctly, I just went to the website of each of the five Regional Internet Registries and found their announced blocks. I generated each RIR's regex using Regexp::Assemble (which fails to create character class ranges). This uses a special SpamAssassin pseudo-header for a cleaner definition.

 

There are other very useful aspects of whois for spam detection, but there's a major hurdle to overcome: it's way too much data to do anything with on a local deployment. You need major cloud services populating databases in close to real-time in order to catch things like hailstorm (side note: my team did this.).

Another service relying on whois data is the Day Old Bread List, which merely lists any domain that is 0-5 days old on the assumption that it's worth penalizing email sending from (or linking to) such domains.

Adam Katz
  • 9,718
  • 2
  • 22
  • 44