Dovecot SpamAssassin results different than standalone

Question

I'm trying to filter spam with smtpd, spampd, dovecot, and SpamAssassin, and I'm finding different results of a spam email between the X-Spam-Report added by spampd/dovecot (which scores low) versus a standalone execution (which scores high).

I'm running on Fedora 25 x64. spampd is running on port 10029 and relays to 10030:

SPAMPD_OPTIONS="--a --L --maxsize=500 --host=127.0.0.1:10029 --relayhost=127.0.0.1:10030"

smtpd sends to spampd and then passes on its response to dovecot:

listen on lo   port 10030 tag SPAMPD
accept tagged SPAMPD for domain <domains> virtual <users> deliver to lmtp "/run/dovecot/lmtp" rcpt-to
accept from any for domain <domains> relay via smtp://127.0.0.1:10029

Dovecot has a global sieve that puts spam into the Spam folder:

if header :contains "X-Spam-Flag" "YES" {
  addflag "\\Seen";
  fileinto "Spam";
  stop;
}

SpamAssassin local.cf:

required_score 5
report_safe 0
add_header spam Flag _YESNOCAPS_
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
add_header all Level _STARS(*)_
add_header all Checker-Version SpamAssassin _VERSION_ (_SUBVERSION_) on _HOSTNAME_
add_header all Report _REPORT_

And init.pre and other pre files:

$ grep -hv -e "^#" -e "^\\s*$" /etc/mail/spamassassin/*pre
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
loadplugin Mail::SpamAssassin::Plugin::Hashcash
loadplugin Mail::SpamAssassin::Plugin::SPF
loadplugin Mail::SpamAssassin::Plugin::Pyzor
loadplugin Mail::SpamAssassin::Plugin::Razor2
loadplugin Mail::SpamAssassin::Plugin::SpamCop
loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
loadplugin Mail::SpamAssassin::Plugin::WhiteListSubject
loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
loadplugin Mail::SpamAssassin::Plugin::ReplaceTags
loadplugin Mail::SpamAssassin::Plugin::DKIM
loadplugin Mail::SpamAssassin::Plugin::Check
loadplugin Mail::SpamAssassin::Plugin::HTTPSMismatch
loadplugin Mail::SpamAssassin::Plugin::URIDetail
loadplugin Mail::SpamAssassin::Plugin::Bayes
loadplugin Mail::SpamAssassin::Plugin::BodyEval
loadplugin Mail::SpamAssassin::Plugin::DNSEval
loadplugin Mail::SpamAssassin::Plugin::HTMLEval
loadplugin Mail::SpamAssassin::Plugin::HeaderEval
loadplugin Mail::SpamAssassin::Plugin::MIMEEval
loadplugin Mail::SpamAssassin::Plugin::RelayEval
loadplugin Mail::SpamAssassin::Plugin::URIEval
loadplugin Mail::SpamAssassin::Plugin::WLBLEval
loadplugin Mail::SpamAssassin::Plugin::VBounce
loadplugin Mail::SpamAssassin::Plugin::ImageInfo
loadplugin Mail::SpamAssassin::Plugin::FreeMail
loadplugin Mail::SpamAssassin::Plugin::AskDNS

I periodically train spam and ham:

$ find /var/vmail -type f -not -path .imap -and -name TrainSpam | xargs sa-learn --mbox --spam --no-sync --dbpath /root/.spamassassin/
$ find /var/vmail -type f -not -path .imap -and \( -name inbox -or -name Archives \) | xargs sa-learn --mbox --ham --no-sync --dbpath /root/.spamassassin/
$ sa-learn --sync --dbpath /root/.spamassassin/

This has generated thousands for each:

$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       6138          0  non-token data: nspam
0.000          0       3219          0  non-token data: nham
0.000          0     131230          0  non-token data: ntokens
0.000          0 1516588946          0  non-token data: oldest atime
0.000          0 1519351200          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1519352084          0  non-token data: last expiry atime
0.000          0    2764800          0  non-token data: last expire atime delta
0.000          0     279622          0  non-token data: last expire reduction count

Now to the problem. As an example, I've received 7 spam messages that are all very similar with the subject, "Congress Gives Homeowners A Once-In-A-Lifetime Mortgage Bailout".

Looking at the source of the latest one delivered by dovecot shows the following spam score which is quite low:

X-Spam-Report: 
        *  2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5059]
        *  1.3 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
        *  0.0 T_REMOTE_IMAGE Message contains an external image

However, running spamassassin -D < email.txt on the same email shows a high spam score (note this email was run through a newer bayes DB than the above):

X-Spam-Report: 
        *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
        *      [score: 1.0000]
        *  0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
        *       See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
        *      for more information.
        *      [URIs: mirror24news.com]
        *  0.0 T_SPF_HELO_TEMPERROR SPF: test of HELO record failed (temperror)
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
        *      [score: 1.0000]
        *  1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
        *  1.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
        *      [cf: 100]
        *  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
        *      valid
        *  0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
        *  0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid

Why are the two different?

Could you show `init.pre` and `local.cf` config files for spamassassin? — Kondybas, Feb 23 '18 at 12:31
@Kondybas Thanks for the help! `local.cf` was in the original post and I've edited the post to add `init.pre`. — Kevin, Feb 23 '18 at 16:51
I just noticed in syslog that spampd couldn't access the bayes DB: `bayes: cannot open bayes databases /var/spool/spampd/.spamassassin/bayes_* R/W: tie failed: Permission denied`; I fixed this now, but it was due to some playing around I was doing yesterday, so I've updated my post to show the X-Spam-Report from an older mail that did apply bayes. — Kevin, Feb 23 '18 at 17:02

Dovecot SpamAssassin results different than standalone

0 Answers0