0

If I give sa-learn Maildir mails to train it as spam, it takes them without problems, but when trying to use a mbox file containing spam emails from my personal Gmail account (using https://takeout.google.com/settings/takeout/custom/gmail), it doesn't like it:

$ grep -c '^From ' spam.mbox
390

$ sa-learn --progress --no-sync --spam --mbox spam.mbox
Learned tokens from 0 message(s) (0 message(s) examined)

So, it's clear that spam.mbox contains emails (390 in fact), but for some reason sa-learn decides to ignore them.

What could been going on here?

Peregring-lk
  • 489
  • 5
  • 18

1 Answers1

1

Perhaps this isn't ideal but I was able to get sa-learn to work by exporting my Gmail spam folder using Thunderbird rather than the Google Takeout utility. It seems there is something strange with GTakeout's mbox format that is giving SA trouble.

To use Thunderbird to export your Gmail spam folder use the following steps:

  1. Install Thunderbird and connect it to your Gmail account using default settings
  2. Install the ImportExportTools add-on for Thunderbird. Download the .xpi file from the bottom of the page, go to Thunderbird->Tools->Addons, Click the settings gear, and click "Install add-on from file". Select the .xpi file. (You may need to press Alt to get the Tools menu to show up.)
  3. Right-click spam folder -> ImportExportTools -> Export remote folder

The exported mbox file should work well with sa-learn.

tlng05
  • 245
  • 2
  • 10