15

This is sort of a general question about training spamassassin. I have a newly set up mailserver which filters incoming mail through spamassassin. I recently got a flight reservation flagged as spam (score 5) and would like to tell spamassassin it's not spam. (Perhaps doing this would also re-send the mail without the modified spamassassin headers?)

I've tried searching around and am only finding stuff about either getting spamassassin to flag messages as spam (and not about fixing false positives), or for people writing emails - how not to be flagged as spam.

So in regards to giving spamassassin feedback on wrong calls:

  1. Is there a way to do this from within an email client (for example: Thunderbird)

  2. Is there a way to do this via the command-line on the mail server?

I'd like to make the process as fluid as possible, but whatever gets the job done.

Details from SpamAssassin regarding the email:

 0.0 FSL_HELO_NON_FQDN_1    No description available.
 0.6 HK_RANDOM_ENVFROM      Envelope sender username looks random
-0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no trust [82.150.225.129 listed in list.dnswl.org]
-0.0 RCVD_IN_MSPIKE_H3      RBL: Good reputation (+3) [82.150.225.129 listed in wl.mailspike.net]
 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different
 1.0 SPF_SOFTFAIL           SPF: sender does not match SPF record (softfail)
 1.6 SUBJ_ALL_CAPS          Subject is all capitals
 1.1 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 0.7 HTML_IMAGE_ONLY_20     BODY: HTML: images with 1600-2000 bytes of words
 0.0 HTML_MESSAGE           BODY: HTML included in message
-0.0 RCVD_IN_MSPIKE_WL      Mailspike good senders
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay lines
 0.0 T_REMOTE_IMAGE         Message contains an external image

Clearly the main culprits are the all-caps subject line SUBJ_ALL_CAPS and the MIME_HTML_ONLY (I guess, no text alternative).

The email was for a flight booking confirmation and the subject looked like this:

 Subject: JENNINGS/NICHOLAS KOSSOW MR 24 JAN MOF DPS

Headers:

X-Envelope-From: <tdsfndprd@amadeus.com>
X-Envelope-To: <nick@xxx.xxx>
Received: from mail1.amadeus.net (unknown)
    by 147-49-15-51.rev.cloud.scaleway.com(Postfix 3.1.0/8.13.0) with SMTP id unknown
    Fri, 20 Jan 2017 07:55:10 +0000
    (envelope-from <tdsfndprd@amadeus.com>
Received: from obeap115 (nat-dns-mnp.amadeus.net [82.150.225.129])
    by mail1.amadeus.net (Postfix) with ESMTP id 3F7A9200042
    for <nick@xxx.xxx>; Fri, 20 Jan 2017 07:55:10 +0000 (GMT)
From: eticket@garuda-indonesia.com
TO: NICK@XXX.XXXX
Message-ID: <CTS/GA/C50D54421A07/1@tds.amadeus.com>
FND-Request-ID: <CTS/GA/C50D54421A07/1@tds.amadeus.com>
Job-ID: 1
Subject: JENNINGS/NICHOLAS KOSSOW MR 24 JAN MOF DPS
Date: Fri, 20 Jan 2017 07:55:09 +0000
Content-Type: multipart/mixed; 
    boundary="----=_Part_191904_1900935199.1484898909762"
MIME-Version: 1.0
Nick Jennings
  • 350
  • 2
  • 10
  • The first step has to be asking **why** it was flagged as spam. Without that we can make general suggestions, show you how to tweak the Bayesian engine for ham, but we can't help specifically address the rule(s) that misfired. Please add to your question the first few lines of the email body **and all the headers**. – MadHatter Jan 20 '17 at 08:19
  • @MadHatter thanks for the suggestions, updated the question accordingly. – Nick Jennings Jan 20 '17 at 08:31
  • You don't say what Bayesian score it got. Are you using the Bayesian engine at all? – MadHatter Jan 20 '17 at 09:02
  • The score was 5.0 .. it's inserted into the subject: *****SPAM 5.0 ***** – Nick Jennings Jan 20 '17 at 09:51
  • 1
    That's the total SA score, not the contributing Bayesian score. But don't worry, from the rest of what you wrote, it looks like your Bayesian engine isn't firing because you've never trained it, and you intend to address that! – MadHatter Jan 20 '17 at 09:57
  • You ask how to integrate your e-mail client into spam learning. This highly depends on the IMAP server involved, if any. Could you add that information? – Jonas Schäfer Jan 20 '17 at 13:28
  • @JonasWielicki ummm, does it? – MadHatter Jan 20 '17 at 13:50
  • @MadHatter I think so. If you want to make a nice integration (beyond nightly cronjobs and having to copy ham mails in specific directories), you will need to get the IMAP server to play along. In fact, I have spent the last few weeks to put together a nice-ish solution for dovecot. (Of course, if your IMAP server supports IMAPSieve you can do this server-agnostic, but unfortunately that’s not the case for the dovecot version I’m using.) – Jonas Schäfer Jan 20 '17 at 14:05
  • @JonasWielicki as far as I can tell from what the OP has written, (s)he's using dovecot and postfix on ubuntu (not sure which version). If you think this can all be done more elegantly, you should definitely consider writing up an answer; but copy-and-cron works fine for me! – MadHatter Jan 20 '17 at 14:14

2 Answers2

23

There is both specific and general advice that may be useful in this case.

Specific

The underlying problem here is that Garuda Airlines, bless their little cotton socks, are sending confirmation emails that bear many of the hallmarks of spam. The subject line is VERY SHOUTY, they send HTML-only emails which contain quite lot of images and very little text, the envelope-sender (tdsfndprd@amadeus.com) is pretty clearly a machine-constructed nonce, and the email provider for their (outsourced) confirmation system (amadeus.com) has a useless SPF record (despite all our advice to the contrary, some people mistakenly think there is value in a record that lists some of their sending systems and ends ~all).

There is not much you can do about most of this. If you want to be sure of these getting through, a line in your ~/.spamassassin/user_prefs that says whitelist_from *@amadeus.com will get these messages through to you. Going further and tampering with the weights of the rules that were triggered is probably a bad idea. The SpamAssassin (SA) ruleset is created by filtering a huge weight of spam, and working out what characteristics apply to most of it; you are likely to open your INBOX to a lot more than just Garuda confirmation emails by turning off those rules.

General

This is exactly the sort of situation the Bayesian engine handles well. It is designed to filter out email that doesn't trigger the other rules but contains stuff you don't want to read, whilst helping through email that does trigger those rules but contains stuff you do want to read.

IIRC, the engine won't do anything if you're not training it. The easiest way to train it is to maintain two folders, called (say) spam and ham. Into spam you put copies of email that made it into your INBOX but you didn't want; into ham you put copies of emails that fell foul of SA but you did want, such as this confirmation email.

Then nightly (or so) you have a cron job that says

sa-learn --spam --mbox mail/spam
sa-learn --ham  --mbox mail/ham

modifying the paths accordingly. Over time, this will teach the engine what you do, and don't, like to read. Since a high Bayesian score can add +4.0 points to an email's SA score, while a low one can subtract 1.9, a well-trained engine can really help SA distinguish what you want to read from what you don't - but you have to put the effort in to teach it.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • 1
    That sounds reasonable. I will give that spam/ham mailbox flow a try. Thanks! – Nick Jennings Jan 20 '17 at 09:54
  • 1
    "bless their little cotton socks" – Alex Reinking Jan 20 '17 at 22:48
  • @MadHatter following up on this. I tried dragging the SPAM email that Spamassassin altered into the Ham folder and when I ran the `sa-learn --ham ...` command, it says it found 0 emails to learn from: `Learned tokens from 0 message(s) (0 message(s) examined)` ... I tried catting the `.eml` attachment that SpamAsssasin put the original email in, into the Ham folder directly on the server, but still, says it finds 0 messages to process... – Nick Jennings Jan 22 '17 at 12:44
  • I should add I'm dealing with the original email as an attachment as per the `report_safe 1` setting. – Nick Jennings Jan 22 '17 at 13:13
  • @NickJennings then you will probably have to use a MIME-capable client to strip out the original email, to feed to `sa-learn`. If that's too much of a pain, turn off `report_safe`. It's a good idea to train the ham-learner with other things besides stuff that was mistakenly-identified as spam, since the Bayesian filter's assumptions are separate from SA's as a whole. I feed mine all the personal mail I receive, since that's the stuff I most want to read. – MadHatter Jan 22 '17 at 13:21
7

You seem to be using dovecot. I have spent a few weeks trying to figure out a smooth integration, which allows users to easily train the server-side spam filters without having to copy mails.

The key part is the Antispam Dovecot plugin. The antispam plugin triggers on move operations between three folder groups: trash, unsure and spam. Specifically, when a transition from anything (but spam) to spam is detected, a spam learning action is triggered and when a transition from spam to unsure is detected, a ham learning action is triggered.

It supports different training backends. A simple one is mailtrain, which simply executes a command and puts the mail on standard input. A configuration for that might look like this:

plugin {
   antispam_backend = mailtrain
   antispam_mail_sendmail = /usr/local/bin/sa-learn-stdin.sh
   antispam_mail_spam = spam
   antispam_mail_notspam = ham
   antispam_mail_sendmail_args = -L
   antispam_spam = Junk;INBOX.Junk
   antispam_trash = Trash;INBOX.Trash
   antispam_allow_append_to_spam = no
}

Together with /usr/local/bin/sa-learn-stdin.sh:

#!/bin/bash
/usr/bin/spamc "$@" >> /tmp/sa-learn-log
exit 0

The configuration says "To learn as spam, run /usr/local/bin/sa-learn-stdin.sh -L spam and to learn as ham, run /usr/local/bin/sa-learn-stdin.sh -L ham." The arguments are configured by antispam_mail_spam, antispam_mail_notspam and antispam_mail_sendmail_args.

This is already pretty nice. If you can configure your client to move mails you mark as spam into the spam folder, this is already a pretty automatic integration between the client and the server. Likewise, if you configure the server to store mails classified as spam in the spamfolder on delivery (for example using Sieve), the message will be learnt as ham when the user moves it out of the Spam folder.


To improve the integration with Thunderbird and KMail, I wrote a patch for antispam, which unfortunately did not get any feedback from upstream; use at your own risk.

It adds a configuration option to antispam, which can simply be added to the plugin section in the dovecot configuration:

   antispam_spam_flags = "Junk;$JUNK"

(The quotes are important to prevent the $ from doing anything funny.)

With the patch, antispam will also trigger a learning action if a message gets a spam flag or looses all of its spam flags. Flags are an IMAP feature and used by clients to store bits of information server-side. Turns out, Thunderbird and KMail use these flags to store the Junk/Spam-status of messages.

The Junk flag is set by Thunderbird when you mark a message as junk. Likewise for the $JUNK flag an KMail. Thus, with this configuration, you can trigger server-side learning by flagging mail as Junk/NonJunk in Thunderbird in KMail.

Other clients, such as K9-Mail, still play along nicely, because there the default is to move junk in the Spam folder, which antispam will also trigger on.


You can implement the same functionality, I think, in IMAPSieve. This is on my TODO, but unfortunately I currently do not have a testing-ready environment with a sufficiently recent dovecot.

Jonas Schäfer
  • 295
  • 1
  • 11