110

This is a Canonical Question about Fighting Spam.
Also related:

There are so many techniques and so much to know about fighting SPAM. What widely used techniques and technologies are available to Administrator, Domain Owners, and End Users to help keep the junk out of our inboxes?

We're looking for an answer that covers different tech from various angles. The accepted answer should include a variety of technologies (eg SPF/SenderID, DomainKeys/DKIM, Graylisting, DNS RBLs, Reputation Services, Filtering Software [SpamAssassin, etc]); best practices (eg mail on Port 25 should never be allowed to relay, Port 587 should be used; etc), terminology (eg, Open Relay, Backscatter, MSA/MTA/MUA, Spam/Ham), and possibly other techniques.

Chris S
  • 77,337
  • 11
  • 120
  • 212

6 Answers6

104

To defeat your enemy, you must know your enemy.

What is spam?

For our purposes, spam is any unsolicited bulk electronic message. Spam these days is intended to lure unsuspecting users into visiting a (usually shady) web site where they will be asked to buy products, or have malware delivered to their computers, or both. Some spam will deliver malware directly.

It may surprise you to learn that the first spam was sent in 1864. It was an advertisement for dental services, sent via Western Union telegram. The word itself is a reference to a scene in Monty Python's Flying Circus.

Spam, in this case, does not refer to mailing list traffic a user subscribed to, even if they changed their minds later (or forgot about it) but have not actually unsubscribed yet.

Why is spam a problem?

Spam is a problem because it works for the spammers. Spam typically generates more than enough sales (or malware delivery, or both) to cover the costs -- to the spammer -- of sending it. The spammer does not consider the costs to the recipient, you and your users. Even when a tiny minority of users receiving spam respond to it, it's enough.

So you get to pay the bills for bandwidth, servers, and administrator time to deal with incoming spam.

We block spam for these reasons: we don't want to see it, to reduce our costs of handling email, and to make spamming more expensive for the spammers.

How does spam work?

Spam typically is delivered in different ways from normal, legitimate email.

Spammers almost always want to obscure the origin of the email, so a typical spam will contain fake header information. The From: address is usually fake. Some spam includes fake Received: lines in an attempt to disguise the trail. A lot of spam is delivered via open SMTP relays, open proxy servers and botnets. All of these methods make it more difficult to determine who originated the spam.

Once in the user's inbox, the purpose of the spam is to entice the user to visit the advertised web site. There, the user will be enticed to make a purchase, or the site will attempt to install malware on the user's computer, or both. Or, the spam will ask the user to open an attachment which contains malware.

How do I stop spam?

As a system administrator of a mail server, you will configure your mail server and domain to make it more difficult for spammers to deliver their spam to your users.

I will be covering issues specifically focused on spam and may skip over things not directly related to spam (such as encryption).

Don't run an open relay

The big mail server sin is to run an open relay, a SMTP server which will accept mail for any destination and deliver it onward. Spammers love open relays because they virtually guarantee delivery. They take on the load of delivering messages (and retrying!) while the spammer does something else. They make spamming cheap.

Open relays also contribute to the problem of backscatter. These are messages which were accepted by the relay but then found to be undeliverable. The open relay will then send a bounce message to the From: address which contains a copy of the spam.

  • Configure your mail server to accept incoming mail on port 25 only for your own domain(s). For most mail servers, this is the default behavior, but you at least need to tell the mail server what your domains are.
  • Test your system by sending your SMTP server a mail from outside your network where both the From: and To: addresses are not within your domain. The message should be rejected. (Or, use an online service like MX Toolbox to perform the test, but be aware that some online services will submit your IP address to blacklists if your mail server fails the test.)

Reject anything that looks too suspicious

Various misconfigurations and errors can be a tip-off that an incoming message is likely to be spam or otherwise illegitimate.

  • Mark as spam or reject messages for which the IP address has no reverse DNS (PTR record). Treat the lack of a PTR record more harshly for IPv4 connections than for IPv6 connections, as many IPv6 addresses do not yet have reverse DNS, and may not for several years, until DNS server software is better able to handle these potentially very large zones.
  • Reject messages for which the domain name in the sender or recipient addresses does not exist.
  • Reject messages which do not use fully qualified domain names for the sender or recipient domains, unless they originate within your domain and are meant to be delivered within your domain (e.g. monitoring services).
  • Reject connections where the other end does not send a HELO/EHLO.
  • Reject connections where the HELO/EHLO is:
    • not a fully qualified domain name and not an IP address
    • blatantly wrong (e.g. your own IP address space)
  • Reject connections which use pipelining without being authorized to do so.

Authenticate your users

Mail arriving at your servers should be thought of in terms of inbound mail and outbound mail. Inbound mail is any mail arriving at your SMTP server which is ultimately destined for your domain; outbound mail is any mail arriving at your SMTP server which will be transferred elsewhere before being delivered (eg. it's going to another domain). Inbound mail can be handled by your spam filters, and may come from anywhere but must always be destined for your users. This mail can't be authenticated, because it is not possible to give credentials to every site which might send you mail.

Outbound mail, that is, mail which will be relayed, must be authenticated. This is the case whether it comes from the Internet or from inside your network (though you should restrict the IP address ranges allowed to use your mailserver if operationally possible); this is because spambots might be running inside your network. So, configure your SMTP server such that mail bound for other networks will be dropped (relay access will be denied) unless that mail is authenticated. Better still, use separate mail servers for inbound and outbound mail, allow no relaying at all for the inbound ones, and allow no unauthenticated access to the outbound ones.

If your software allows this, you should also filter messages according to the authenticated user; if the from address of the mail does not match the user who authenticated, it should be rejected. Do not silently update the from address; the user should be aware of the configuration error.

You should also log the username which is used to send mail, or add an identifying header to it. This way, if abuse does occur, you have evidence and know which account was used to do it. This allows you to isolate compromised accounts and problem users, and is especially valuable for shared hosting providers.

Filter traffic

You want to be certain that mail leaving your network is actually being sent by your (authenticated) users, not by bots or people from outside. The specifics of how you do this depend on exactly what kind of system you are administering.

Generally, blocking egress traffic on ports 25, 465, and 587 (SMTP, SMTP/SSL, and Submission) for everything but your outbound mailservers is a good idea if you are a corporate network. This is so that malware-running bots on your network cannot send spam from your network either to open relays on the Internet or directly to the final MTA for an address.

Hotspots are a special case because legitimate mail from them originates from many different domains, but (because of SPF, among other things) a "forced" mailserver is inappropriate and users should be using their own domain's SMTP server to submit mail. This case is much harder, but using a specific public IP or IP range for Internet traffic from these hosts (to protect your site's reputation), throttling SMTP traffic, and deep packet inspection are solutions to consider.

Historically, spambots have issued spam mainly on port 25, but nothing prevents them from using port 587 for the same purpose, so changing the port used for inbound mail is of dubious value. However, using port 587 for mail submission is recommended by RFC 2476, and allows for a separation between mail submission (to the first MTA) and mail transfer (between MTAs) where that is not obvious from network topology; if you require such separation, you should do this.

If you are an ISP, VPS host, colocation provider, or similar, or are providing a hotspot for use by visitors, blocking egress SMTP traffic can be problematic for users who are sending mail using their own domains. In all cases except a public hotspot, you should require users who need outbound SMTP access because they are running a mailserver to specifically request it. Let them know that abuse complaints will ultimately result in that access being terminated to protect your reputation.

Dynamic IPs, and those used for virtual desktop infrastructure, should never have outbound SMTP access except to the specific mailserver those nodes are expected to use. These types of IPs should also appear on blacklists and you should not attempt to build reputation for them. This is because they are extremely unlikely to be running a legitimate MTA.

Consider using SpamAssassin

SpamAssassin is a mail filter which can be used to identify spam based on the message headers and content. It uses a rules-based scoring system to determine the likelihood that a message is spam. The higher the score, the more likely the message is spam.

SpamAssassin also has a Bayesian engine which can analyze spam and ham (legitimate email) samples fed back into it.

Best practice for SpamAssassin is not to reject the mail, but to put it in a Junk or Spam folder. MUAs (mail user agents) such as Outlook and Thunderbird can be set up to recognize the headers that SpamAssassin adds to email messages and to file them appropriately. False positives can and do happen, and while they're rare, when it happens to the CEO, you will hear about it. That conversation will go much better if the message was simply delivered to the Junk folder rather than rejected outright.

SpamAssassin is almost one-of-a-kind, though a few alternatives exist.

  • Install SpamAssassin and configure automatic update for its rules using sa-update.
  • Consider using custom rules where appropriate.
  • Consider setting up Bayesian filtering.

Consider using DNS-based blackhole lists and reputation services

DNSBLs (formerly known as RBLs, or realtime blackhole lists) provide lists of IP addresses associated with spam or other malicious activity. These are run by independent third parties based on their own criteria, so research carefully whether the listing and delisting criteria used by a DNSBL is compatible with your organization's need to receive email. For instance, a few DNSBLs have draconian delisting policies which make it very difficult for someone who was accidentally listed to be removed. Others automatically delist after the IP address has not sent spam for a period of time, which is safer. Most DNSBLs are free to use.

Reputation services are similar, but claim to provide better results by analyzing more data relevant to any given IP address. Most reputation services require a subscription payment or hardware purchase or both.

There are dozens of DNSBLs and reputation services available, though some of the better known and useful ones I use and recommend are:

Conservative lists:

Aggressive lists:

As mentioned before, many dozens of others are available and may suit your needs. One of my favorite tricks is to look up the IP address which delivered a spam that got through against multiple DNSBLs to see which of them would have rejected it.

  • For each DNSBL and reputation service, examine its policies for listing and delisting of IP addresses and determine whether these are compatible with your organization's needs.
  • Add the DNSBL to your SMTP server when you have decided it is appropriate to use that service.
  • Consider assigning each DNSBL a score and configuring it into SpamAssassin rather than your SMTP server. This reduces the impact of a false positive; such a message would be delivered (possibly to Junk/Spam) instead of bounced. The tradeoff is that you will deliver a lot of spam.
  • Or, reject outright when the IP address is on one of the more conservative lists, and configure the more aggressive lists in SpamAssassin.

Use SPF

SPF (Sender Policy Framework; RFC 4408 and RFC 6652) is a means to prevent email address spoofing by declaring which Internet hosts are authorized to deliver mail for a given domain name.

  • Configure your DNS to declare an SPF record with your authorized outgoing mail servers and -all to reject all others.
  • Configure your mail server to check the SPF records of incoming mail, if they exist, and reject mail which fails SPF validation. Skip this check if the domain does not have SPF records.

Investigate DKIM

DKIM (DomainKeys Identified Mail; RFC 6376) is a method of embedding digital signatures in mail messages which can be verified using public keys published in the DNS. It is patent-encumbered in the US, which has slowed its adoption. DKIM signatures can also break if a message is modified in transit (e.g. SMTP servers occasionally may repack MIME messages).

  • Consider signing your outgoing mail with DKIM signatures, but be aware that the signatures may not always verify correctly even on legitimate mail.

Consider using greylisting

Greylisting is a technique where the SMTP server issues a temporary rejection for an incoming message, rather than a permanent rejection. When the delivery is retried in a few minutes or hours, the SMTP server will then accept the message.

Greylisting can stop some spam software which is not robust enough to differentiate between temporary and permanent rejections, but does not help with spam that was sent to an open relay or with more robust spam software. It also introduces delivery delays which users may not always tolerate.

  • Consider using greylisting only in extreme cases, since it is highly disruptive to legitimate email traffic.

Consider using nolisting

Nolisting is a method of configuring your MX records such that the highest priority (lowest preference number) record does not have a running SMTP server. This relies on the fact that a lot of spam software will only try the first MX record, while legitimate SMTP servers try all MX records in ascending order of preference. Some spam software also attempts to send directly to the lowest priority (highest preference number) MX record in violation of RFC 5321, so that could also be set to an IP address without an SMTP server. This is reported to be safe, though as with anything, you should test carefully first.

  • Consider setting your highest-priority MX record to point to a host which does not answer on port 25.
  • Consider setting your lowest-priority MX record to point to a host which does not answer on port 25.

Consider a spam filtering appliance

Place a spam filtering appliance such as Cisco IronPort or Barracuda Spam & Virus Firewall (or other similar appliances) in front of your existing SMTP server to take much of the work out of reducing the spam you receive. These appliances are pre-configured with DNSBLs, reputation services, Bayesian filters and the other features I've covered, and are updated regularly by their manufacturers.

  • Research spam filtering appliance hardware and subscription costs.

Consider hosted email services

If it's all too much for you (or your overworked IT staff) you can always have a third party service provider handle your email for you. Services such as Google's Postini, Symantec MessageLabs Email Security (or others) will filter messages for you. Some of these services can also handle regulatory and legal requirements.

  • Research hosted email service subscription costs.

What guidance should sysadmins give to end users regarding fighting spam?

The absolute #1 thing that end users should do to fight spam is:

  • DO NOT RESPOND TO THE SPAM.

    If it looks funny, don't click the website link and don't open the attachment. No matter how attractive the offer seems. That viagra isn't that cheap, you aren't really going to get naked pictures of anybody, and there is no $15 million dollars in Nigeria or elsewhere except for the money taken from people who did respond to the spam.

  • If you see a spam message, mark it as Junk or Spam depending on your mail client.

  • DO NOT mark a message as Junk/Spam if you actually signed up to receive the messages and just want to stop receiving them. Instead, unsubscribe from the mailing list using the unsubscribe method provided.

  • Check your Junk/Spam folder regularly to see if any legitimate messages got through. Mark these as Not Junk/Not Spam and add the sender to your contacts to prevent their messages from being marked as spam in the future.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • (Note: I'm not the one who downvoted your answer so I can't explain it) Globally, I like your answer but I would suggest you review the paragraph "Reject anything that looks too suspicious". What you suggest typically has no effect on the amount of spam you filter but causes lots of problems for (legitimate) end users. – Stephane Aug 23 '12 at 14:17
  • @Stephane Which one? There's nothing there that has any impact on legitimate mail. – Michael Hampton Aug 23 '12 at 15:05
  • 6
    @MichaelHampton: UCEPROTECT is a shady organization. – InternetSeriousBusiness Aug 23 '12 at 20:30
  • @InternetSeriousBusiness So I've heard. Nevertheless the service is useful. I've updated the answer a bit. – Michael Hampton Aug 23 '12 at 20:34
  • @MichaelHampton The most problematic one is the first one, TBH. It doesn't help at all preventing spam because the PTR record is typically not controlled by the end user and can be set to anything or not at all. It's just arbitrary. The others makes a bit of sense, but have little or no impact when combined with the other measures. – Stephane Aug 24 '12 at 09:40
  • 10
    @Stephane If you can't have the PTR record set/changed, then you aren't in control of the IP address. There's nothing wrong with rejecting mail based on this. – Michael Hampton Aug 24 '12 at 18:21
  • @MichaelHampton One example on the PTR record. AT&T in the US has an [especially terrible reverse PTR record policy](http://planner.bus.att.com/tab010c.pdf) that requires them to manage at least on forward DNS domain for the associated IP block. They do not use stub PTR records, so I see a ton of sites using AT&T and don't have any reverse record in place. Blocking based on that is a little harsh because of situations like this. I'm working with an established company now who needs this functionality from AT&T. The process is going to take 3 weeks, but they still need to get mail out... – ewwhite Aug 25 '12 at 17:06
  • 1
    @ewwhite That's pretty draconian, and 3 weeks is pretty ridiculous. But rejecting mail when there's no PTR record is quite common, so I'm sure they're having all sorts of issues. – Michael Hampton Aug 25 '12 at 19:02
  • 2
    The rejection is common but I maintain that is it both useless and unnecessary. In fact, I've run down a quick check on my own spam statistics and it turns out that the number of spam coming from IPs without reverse is under 5% and that seems to be pretty much the same number as what I see from overall SMTP connections. Hence my conclusion: it's a pointless restriction. – Stephane Aug 27 '12 at 12:10
  • 5% is not pointless. That's a few hundred spam messages here every day, and that's on my personal vanity domain. – Michael Hampton Aug 27 '12 at 19:01
  • 1
    I should also note that on my domain which has been collecting spammers for nearly 10 years, blocking connections without a PTR record eliminates 75% of incoming spam. If it _were_ 5%, then it might be reasonable to drop it, but out here in the real world it's extremely useful. – Michael Hampton Mar 11 '13 at 08:41
  • 1
    @Stephane There's a pretty big distinction between "have reverse DNS" and has **correct** reverse DNS. Also, I've been running mail servers for over a decade and can count on one hand the number of times a legitimate e-mail was blocked because of rDNS issues. "Correct" being that the IP reverses to the claimed name, though I really shouldn't have to explain that to an e-mail administrator... – Chris S Sep 20 '13 at 14:57
  • 1
    @ChrisS I stand by my words: the distinction is arbitrary and ineffective to filter out spam. Whether it's "correct" (whatever you mean by that: people obviously have different expectations that are beyond the strict "it exists") or not, it doesn't provide effective filtering (ineffective) and isn't mandated by RFCs (it's arbitrary). When you do that, you just hope that everyone legitimate works in a similar environment than you and that this improve the likelihood they are "legit". – Stephane Sep 20 '13 at 15:05
  • 2
    What evidence do you have to back your claim that it is ineffective? My logs show that it is overwhelmingly effective in pre-screening e-mail. A number of other people I know have similar experiences. – Chris S Sep 20 '13 at 16:00
  • 1
    "To defeat your enemy, you must know your enemy." - this is asking for a Sun Tzu citation! *"It is said that if you know your enemies and know yourself, you will not be imperiled in a hundred battles; if you do not know your enemies but do know yourself, you will win one and lose one; if you do not know your enemies nor yourself, you will be imperiled in every single battle."* – ivan_pozdeev Jul 03 '15 at 12:29
  • I checked PTR record-based blocking rules against my mailbox, and there are legitimate e-mails that would have been rejected because of the lack of a reverse DNS record. There are admittedly very few of them (just a few senders in a mailbox with 40 000 emails), but there is important stuff there. In particular, if I rejected hosts without any PTR records, an email with the results of my visa application to a particular Southeast Asian country wouldn't have come through. A safer solution is to reject the email if the sending host has a PTR record that doesn't pass the FCrDNS verification. – michau Aug 16 '19 at 14:15
31

I've managed over 100 separate mail environments over the years and have used numerous processes to reduce or help eliminate spam.

Technology has evolved over time, so this answer will walk through some of the things I've tried in the past and detail the current state of affairs.

A few thoughts about protection...

  • You want to protect port 25 of your incoming mail server from being an open relay, where anyone can send mail through your infrastructure. This is independent of the particular mail server technology you may be using. Remote users should use an alternate submission port and some form of required authentication for relaying mail. Port 587 or port 465 are the common alternatives to 25.
  • Encryption is also a plus. A lot of mail traffic is sent in cleartext. We're at the point now where most mail systems can support some form of encryption; some event expect it.
  • These are more proactive approaches to preventing your mail site from being classified as a spam source...

With regard to incoming spam...

  • Greylisting was an interesting approach for a short period of time. Force a temporary reject/delay in hopes that a spammer would disconnect and avoid exposure or the time and resources needed to requeue messages. This had the effect of unpredictable delays in mail delivery, didn't work well with mail from large server farms and spammers eventually developed workarounds. The worst impact was breaking the user expectation of speedy mail delivery.
  • Multiple MX relays still need protection. Some spammers would try sending to a backup or lower-priority MX in hopes that it had less robust filtering.
  • Realtime Black(hole) Lists (RBL/DNSBL) - These reference centrally-maintained databases to verify whether a sending server is listed. Heavy reliance on RBL's comes with caveats. Some were not as reputable as others. The offerings from Spamhaus have always been good for me. Others, like SORBS, have a poor approach to listing IP's, and often block legitimate email. It's been likened to an extortion plot in some cases, because delisting often involves $$$.
  • Sender Policy Framework (SPF) - Basically a means of ensuring that a given host is authorized to send mail for a particular domain, as defined by a DNS TXT record. It's good practice to create SPF records for your outgoing mail, but bad practice to require it from the servers sending to you.
  • Domain Keys - Not in widespread use... yet.
  • Bounce suppression - Prevent invalid mail from being returned to its source. Some spammers would try to see which addresses were live/valid by analyzing backscatter to create a map of usable addresses.
  • Reverse DNS/PTR checks - Check that a sending server has a valid reverse PTR record. This doesn't not need to match the originating domain, as it's possible to have a many-to-one mapping of domains to a host. But it's good to determine ownership of an IP space and to determine whether the originating server is part of a dynamic IP block (e.g. home broadband - read: compromised spambots).
  • Content filtering - (unreliable) - Trying to counter permutations of "(Viagra, v\|agra, viagra, vilgra.)" is time-consuming for the administrator and doesn't scale in a larger environment.
  • Bayesian filtering - More advance spam solutions allow global or per-user training of mail. Read the linked article on the heuristics, but the main point is that mail can be manually classified as good (Ham) or bad (Spam), and the resulting messages populate a Bayesian database that can be referenced to determine the categorization of future messages. Typically, this is associated with a spam score or weighting, and can be one in a handful of techniques used to determine whether a message should be delivered.
  • Rate controlling/throttling - Simple approach. Limit how many messages a given server can attempt to deliver within a certain period of time. Defer all messages over that threshold. This is usually configured on the mail server side.
  • Hosted and cloud filtering. Postini comes to mind, as that was a cloud solution before cloud was a buzzword. Now owned by Google, the strength of a hosted solution is that there are economies of scale inherent to processing the volume of mail that they encounter. Data analysis and simple geographic reach can help a hosted spam filtering solution adapt to trends. The execution is simple, though. 1). Point your MX record to the hosted solution, 2). provide a post-filtering server delivery address. 3). Profit.

My current approach:

I'm a strong advocate of appliance-based spam solutions. I want to reject at the perimeter of the network and save the CPU cycles at the mail server level. Using an appliance also provides some independence from the actual mail server (mail delivery agent) solution.

I recommend Barracuda Spam Filter appliances for a number of reasons. I've deployed several dozen units, and the web-driven interface, industry mindshare and set-and-forget appliance nature make it a winner. The backend technology incorporates many of the techniques listed above.

  • I block port 25 on my mail server's IP address and instead set the MX record for the domain to the Barracuda appliance's public-facing address - e.g. spam.domain.com. Port 25 will be open for mail delivery.
  • The core is SpamAssassin-derived with a simple interface to a message log (and Bayesian database) that can be used to classify good mail from bad during an initial training period.
  • Barracuda leverages several RBL's by default, including those by Spamhaus.org, and their own BRBL reputation database. Note - the BRBL is usable for free as a standard RBL for other mail systems.
  • The Barracuda reputation database is compiled from live data, honeypots, large-scale analysis and any number of proprietary techniques. It has a registered whitelist and blocklist. High-volume and high-visibility mail senders often register with Barracuda for automatic whitelisting. Examples include Blackberry, Constant Contact, etc.
  • SPF checks can be enabled (I don't enable them, though).
  • There's an interface to review mail and redeliver from the appliance's mail cache as necessary. This is helpful in instances where a user was expecting a message that may not have passed all of the spam checks.
  • LDAP/Active Directory user verification helps accelerate the detection of invalid mail recipients. This saves bandwidth and prevents backscatter.
  • IP/sender address/domain/country-of-origin can all be configured. If I want to deny all mail from Italian domain suffixes, it's possible. If I want to prevent mail from a particular domain, it's easily configured. If I want to block a user's stalker from sending email to the user, it's doable (true story).
  • Barracuda provides a number of canned-reports and a good visual display of the appliance status and spam metrics.
  • I like having an appliance onsite to keep this processing in-house and possibly have a post-filter email journaling connection (in environments where mail retention is necessary).
  • Plus The appliance can reside in a virtualized infrastructure.

Barracuda Spam & Virus Firewall 300 status console enter image description here


Newer approach:

I've been experimenting with Barracuda's Cloud-based Email Security Service over the past month. This is similar to other hosted solutions, but is well-suited to smaller sites, where an expensive appliance is cost-prohibitive. For a nominal yearly fee, this service provides about 85% of what the hardware appliance does. The service can also be run in tandem with an onsite appliance to reduce incoming bandwidth and provide another layer of security. It's also a nice buffer that can spool mail in the event of a server outage. The analytics are still useful, although, not as detailed as a physical unit's.

Barracuda Cloud Email Security console enter image description here

All in all, I've tried many solutions, but given the scale of certain environments, and the increasing demands of the user base, I want the most elegant solution(s) available. Taking the multi-pronged approach and "rolling your own" is certainly possible, but I've done well with some basic security and good use monitoring of the Barracuda device. Users are very happy with the result.

Note: Cisco Ironport is great as well... Just costlier.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
27

Partly, I endorse what others have said; partly, I don't.

Spamassassin

This works very well for me, but you need to spend some time training the Bayesian filter with both ham and spam.

Greylisting

ewwhite may feel its day has come and gone, but I can't agree. One of my clients asked how effective my various filters were, so here are approximate stats for July 2012 for my personal mailserver:

  • 46000 messages attempted delivery
  • 1750 got through greylisting
  • 250 got through greylisting + trained spamassassin

So about 44000 never made it through the greylisting; if I'd not had greylisting, and had accepted all those, they'd have all needed spam filtering, all using CPU and memory, and indeed bandwidth.

Edit: since this answer seems to have been useful to some people, I thought I'd bring the statistics up-to-date. So I re-ran the analysis on the mail logs from Jan 2015, 2.5 years later.

  • 115,500 messages attempted delivery
  • 13,300 got through greylisting (and some basic sanity checks eg valid sender domain)
  • 8,500 got through greylisting + trained spamassassin

The numbers aren't directly comparable, because I no longer have a note of how I arrived at the 2012 figures, so I can't be sure the methodologies were identical. But I have confidence that I didn't have to run computationally-expensive spam filtering on an awful lot of content back then, and I still don't, because of greylisting.

SPF

This isn't really an anti-spam technique, but it can reduce the amount of backscatter you have to deal with, if you're joe-jobbed. You should use it both in and out, that is: You should check the SPF record of the sender for incoming email, and accept/reject accordingly. You should also publish your own SPF records, listing fully all machines that are approved to send mail as you, and lock out all others with -all. SPF records that don't end in -all are completely useless.

Blackhole lists

RBLs are problematic, since one can get onto them through no fault of one's own, and they can be hard to get off. Nevertheless, they have a legitimate use in spam-fighting but I would strongly suggest that no RBL should ever be used as a bright-line test for mail acceptance. The way spamassassin handles RBLs - by using many, each of which contributes towards a total score, and it's this score that makes the accept/reject decision - is much better.

Dropbox

I don't mean the commercial service, I mean that my mail server has one address which cuts through all my greylisting and spam-filtering, but which instead of delivering to anyone's INBOX, it goes to a world-writeable folder in /var, which is automatically pruned nightly of any emails over 14 days old.

I encourage all users to take advantage of it when eg filling out email forms that require a validatable email address, where you're going to receive one email that you need to keep, but from whom you never wish to hear again, or when buying from online vendors who will likely sell and/or spam their address (particularly those outside the reach of European privacy laws). Instead of giving her real address, a user can give the dropbox address, and look in the dropbox only when she expects something from a correspondent (usually a machine). When it arrives, she can pick it out and save it in her proper mail collection. No user need look in the dropbox at any other time.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • 1
    I really like the dropbox address idea. – blalor Jul 11 '15 at 02:55
  • Greylisting is a "selfish" solution; it delays lots of legitimate mail, and as more and more mail servers deploy it, more and more spammers will ensure their spam is robust to it. In the end, we lose out. I would recommend greylisting for small deployments and would *strongly* recommend against it for larger deployments. Consider [tarpitting](https://en.wikipedia.org/wiki/Tarpit_%28networking%29) instead. [Milter-greylist](http://hcpnet.free.fr/milter-greylist/) can do either. – Adam Katz Jul 30 '15 at 22:33
  • 1
    @AdamKatz that is certainly a point of view. I'm not sure how spammers are supposed to make their spam robust to greylisting without abandoning fire-and-forget spam, in which case, job done - as opposed to defeating tarpitting, which only requires a small code improvement in the zombies. But I don't agree with you about selfishness. When the tradeoff is explained (if you want real-time email for irregular correspondents, mail and comms budget increases twenty-fold), most much prefer the delay. – MadHatter Jul 31 '15 at 07:06
  • @AdamKatz note also that my "dropbox", above, doesn't get hit by greylisting. So any user needing desperately to receive pre-arranged email in a timely manner has an automatic workaround - they know to give the "immediate" address, and keep an eye on the dropbox until the particular item is received. – MadHatter Jul 31 '15 at 07:08
  • Yes, the risk is indeed spammers using real mail infrastructure rather than "fire-and-forget,." and that risk is very real. You might be surprised at how many users expect email to be near-realtime, and most MTAs wait [15+ minutes](https://en.wikipedia.org/wiki/Greylisting#Disadvantages) before their first resend attempt. Your dropbox idea is great, and I did something similar when I maintained a server with greylisting (each user had a secret bypass-greylisting address, though was no [mailinator](https://www.mailinator.com/) whereas your dropbox is just like mailinator). – Adam Katz Jul 31 '15 at 16:27
  • Tarpitting scales well because it slows down an entire spam campaign without risking the loss from non-greylisting-compliant ham relays or risking the 15+ minutes (a study in 2009 concluded that [many spam bots drop within 100 seconds](http://www.mailchannels.com/blog/2009/03/spammers-continue-to-suffer-from-premature-disconnection/)). Spam bots can only keep so many connections open at a given time, so this limits their ability to spam. It also slows down snowshoe spam (assuming it survives the wait) and gives the spam filters a chance to catch up. – Adam Katz Jul 31 '15 at 16:46
  • 1
    @AdamKatz since my greylisting insists on a 10 minute gap between first and successful delivery attempts, the 15+ minute pause is no major hardship. As for user expectations, those can (and of course should) be managed, just like any other. The rest of your argument is much more convincing - perhaps you could add your own answer, introducing some concrete figures about tarpitting's effectiveness in your deployments? We can theorise about expected relative effectiveness forever, but data are much more enlightening - [nullius in verba](https://en.wikipedia.org/wiki/Nullius_in_verba)! – MadHatter Aug 01 '15 at 05:36
  • I unfortunately lack sharable data beyond the link I've already provided. – Adam Katz Aug 01 '15 at 20:48
14

I am using a number of techniques which reduce spam to acceptable levels.

Delay accepting connections from incorrectly configured servers. A majority of the Spam I receive is from Spambots running on malware infected system. Almost all of these do not pass rDNS validation. Delaying for 30 seconds or so before each response causes most Spambots to give up before they have delivered their message. Applying this only to servers which fail rDNS avoids penalizing properly configured servers. Some incorrectly configured legitimate bulk or automated senders get penalized, but do deliver with minimal delay.

Configuring SPF for all your domains protects your domains. Most sub-domains should not be used to send email. The main exception is MX domains which must be able to send mail on their own. A number of legitimate senders delegate bulk and automated mail to servers that are not permitted by their policy. Deferring rather than rejecting based on SPF allow them to fix their SPF configuration, or you to whitelist them.

Requiring a FQDN (Fully Qualified Domain Name) in the HELO/EHLO command. Spam often uses an unqualified hostname, address literals, ip addresses, or invalid TLD (Top Level Domain). Unfortunately some legitimate senders use invalid TLDs so it may be more appropriate to defer in this case. This can require monitoring and whitelisting to enable the mail through.

DKIM helps with non-repudiation, but is otherwise not highly useful. My experience is that Spam is not likely to be signed. Ham is more likely to be signed so it has some value in Spam scoring. A number of legitimate senders don't publish their public keys, or otherwise improperly configure their system.

Greylisting is helpful for servers which show some signs of misconfiguration. Servers that are properly configured will get through eventually, so I tend to exclude them from greylisting. It is useful to greylist freemailers as they do tend to be used occasionally for Spam. The delay gives some of the Spam filter inputs time to catch the Spammer. It also tends to deflect Spambots as they usually don't retry.

Blacklists and Whitelists can help as well.

  • I have found Spamhaus to be a reliable blacklist.
  • Auto WhiteListing in the Spam filter helps smooth out the rating of frequent senders that are occasionally Spamish, or Spammers who are occasionally Hamish.
  • I find dnsl.org's whitelist useful as well.

Spam filtering software is reasonably good at finding Spam although some will get through. It can be tricky getting the false negative to a reasonable level without increasing the false positive too much. I find Spamassassin catches most of the Spam that reaches it. I've added a few custom rules, that fit my needs.

Postmasters should configure the required abuse and postmaster addresses. Acknowledge the feedback you get to these addresses and act on it. This allows other to help you ensure your server is properly configured and not originating Spam.

If you are a developer, use the existing email services rather than setting up your own server. It is my experience that servers setup for automated mail senders are likely to be incorrectly configured. Review the RFCs and send properly formatted email from a legitimate address in your domain.

End users can do a number of things to help reduce Spam:

  • Don't open it. Flag it as Spam or Delete it.
  • Ensure your system is secure and malware free.
  • Monitor your network usage, especially when you aren't using your system. If it generates a lot of network traffic when you aren't using it, it may be sending spam.
  • Turn off your computer when you aren't using it. (It won't be able to generate Spam if its turned off.)

Domain owners / ISPs can help by limiting Internet access on port 25 (SMTP) to official e-mail servers. This will limit the ability of Spambots to send to the Internet. It also helps when dynamic addresses return names which do not pass rDNS validation. Even better is to verify the PTR record for mail servers do pass rDNS valiation. (Verify for typographical errors when configuring PTR records for your clients.)

I have started classifying email in three categories:

  • Ham (almost always from properly configured servers, properly formatted, and commonly personal e-mail.)
  • Spam (Mostly from Spambots, but a certain percentage is from freemailers or other senders with improperly configured servers.)
  • Bacn; could be Ham or Spam (Includes a lot of mail from mailing lists and automated systems. Ham usually end up here because of DNS and/or server misconfiguration.)
BillThor
  • 27,354
  • 3
  • 35
  • 69
  • [**Bacn**](http://www.podcamppittsburgh.com/2007/08/podcamp-pittsburgh-2-cooks-up-bacn/) (note the missing `o`) is a standardized term referring to "mail you want, but not right now." Another category for mail is [**Graymail**](https://en.wikipedia.org/wiki/Graymail_%28email%29), which is bulk mail that is not technically spam and could be unwanted by some of its recipients yet wanted by others. – Adam Katz Jul 30 '15 at 22:29
6

The SINGLE most effective solution I have seen is to use one of the external mail filtering services.

I have experience with the following services at current clients. I am sure there are others. Each of these has done an excellent job in my experience. The cost is reasonable for all three.

  • Postini from Google
  • MXLogic from McAfee
  • SecureTide from AppRiver

The services have several huge advantages over local solutions.

  1. They stop most (>99%) of the spam BEFORE it hits your internet connection and your email server. Given the volume of spam, this is a lot of data not on your bandwidth and not on your server. I have implemented one of these services a dozen times and every one resulted in a noticeable performance improvement to the email server.

  2. They also do anti-virus filtering, typically both directions. This mitigates the need to have a "mail anti-virus" solution on your server, and also keeps the virii completely

They also do a great job at blocking spam. In 2 years working at a company using MXLogic, I have never has a false positive, and can count the legit spam messages that got through on one hand.

tomjedrz
  • 5,964
  • 1
  • 15
  • 26
  • 2
    +1 for recognizing the benefit of hosted solutions and the uptime/scale and reduced traffic benefits. The only issue I find is a lack of customization and response in some cases (from the perspective of someone who has to SEND to domains protected by those services). Also, some firms have security/compliance reasons for not being able to use external filtering. – ewwhite Aug 25 '12 at 22:42
5

No two mail environments are the same. So building an effective solution will require a lot of trial and error around the many different techniques available because the content of email, traffic, software, networks, senders, recipients and a lot more will all vary hugely across different environments.

However I find the following block lists (RBLs) to be well suited for general filtering:

As already stated SpamAssassin is a great solution when configured correctly, just make sure to install as many of the addon Perl modules in CPAN as possible as well as Razor, Pyzor and DCC. Postfix works very well with SpamAssassin and it's a lot easier to manage and configure than EXIM for example.

Finally blocking clients at IP level using fail2ban and iptables or similar for short periods of time (say one day to a week) after some events such as triggering a hit on an RBL for abusive behavior can also be very effective. Why waste resources talking to a known virus infected host right?

Cristian Ciupitu
  • 6,226
  • 2
  • 41
  • 55
Fat Finger
  • 181
  • 2
  • 7