Thunderbird: filters don't match links

3

0

I use filters to remove undesirable messages (in addition to the integrated spam filter).

My problem is, since years (so with every Thunderbird release I ever had, even the current one which is up to date) it is unable to filter links.
For example I want to delete every messages containing a link to http://xxxxx.emv3.com/xxxxxx
I never manage to remove those emails. I use a filter on the body, checking if it contains emv3 but this never matches. Those emails are in HTML format, and the links are displayed as a text like "Visit our website".

If I write a HTML email with a link, my filter works.
When this is a spam, this never works.

When I save the email to a text file, I open it with notepad and I see several http://xxxxx.emv3.com/xxxx

Any idea why this don't work and how can I do?

Gregory MOUSSAT

Posted 2012-03-21T17:51:44.843

Reputation: 1 031

Have you turned on the Spam filter in Thunderbird? – avirk – 2012-06-26T18:14:31.263

Read this KB Article may be help you.

– avirk – 2012-06-26T18:21:04.080

Are you using an imap account? – Ahmed Bilfaqih – 2012-06-27T06:48:53.803

Remember that some (most?) spam emails use an image to display their crap, bypassing filters which use text to determine badness. Or they may use %48%45%58 (HEX) codes to obfuscate the contents too. Many many ways to get around filters. – lornix – 2012-06-28T03:55:12.777

1Can you save one of theses messages and post it for us ? – harrymc – 2012-06-29T18:18:36.533

Answers

4

You can use thunderbird addon FiltaQuilla.

After installing the extension you can activate all filter actions and search terms that you need in the add-on options (Thunderbird→Tools→Add-ons→FiltaQuilla→Options). These are the options provided by the current version as of this writing.

enter image description here

enter image description here

You will definitely want to have a look at the documentation on the authors website to explore all the possibilities and find out how to use regexp filters and all the other goodies.

Here is a typical example of a (very simple) catch-all email address pattern, that accepts only addresses that contain at least one dot in front of the @. You can also see some exceptions to these rules for senders that are found in the local address book and for legacy addresses that don’t obey the secret pattern.

example

It's an amazing Thunderbird Addon. I hope this help.

If this do not match your needs you could use SpamPal with RegExFilter Plugin and detect unwanted messages. For junk-mail detection to be effective, however, you must "train" it.

Junk Mail control

Message Filters

Kalatzis Stefanos

Posted 2012-03-21T17:51:44.843

Reputation: 346

Seems a very nice extension. But I don't see anything which can match message body. So I can't match anything better than with the regular Thunderbird filters. May I miss something ? – Gregory MOUSSAT – 2012-06-28T22:53:25.793

If this do not match your needs you could use SpamPal with RegExFilter Plugin and detect unwanted messages. For junk-mail detection to be effective, however, you must "train" it. – Kalatzis Stefanos – 2012-06-29T08:10:52.077

I don't want to train it. I want to match links. – Gregory MOUSSAT – 2012-06-29T11:33:55.320

1FiltaQuilla will not do what the OP wants because any of the search options works on the header or a text-only representation of the message body, it does not process the raw message. You can search the raw body using the "Javascript Action With Body" but that's a hack like having the horse behind the cart (searching as the filter action). – MV. – 2012-07-02T01:19:29.703

1Actually it can be done with FiltaQuilla by using the Javascript Search Term, but it is messy and the only way to specify the search terms is by hand with a JS array... and it crashes my TB when testing. I think the only easy way would be to ask the FiltaQuilla author to include a RawBody search option. – MV. – 2012-07-02T02:24:51.027

1

It does not work because the filters (and the current version of FiltaQuilla) only see the text representation of the HTML code when filtering, if any, because for some folders (offline IMAP folders) they only see the message headers.

I really don't know why default filters in Thunderbird don't allow users to filter the raw body, I guess it's because nobody has requested it. Also, I don't know why sophisticated plugins like FiltaQuilla don't provide raw body access out-of-the-box,. Again, maybe is lack of user's interest.

So, I can tell you how to do it with FiltaQuilla but your are not gonna like it. It's messy, hackish, slow, fragile and not user-friendly at all. But it's possible. It works on my computer. It should works in yours. Unless, of course, Thunderbird crashes and corrupts your mailbox (as it happened once here while I was testing this, it never worked again in that folder). Surprisingly it worked flawlessly with my IMAP folders. So think of this like an experiment, not a final solution.

If you already have FiltaQuilla, enable Javascript in the Search Term tab in the Preferences window (restart Thunderbird).

Now create a filter as usual, in the what-to-search list look for Javascript. In the next list choose Matches. There will be an edit icon, select it and insert the following code (Note: this code is based on some tests included with the source code of Thunderbird):

let mylist = ["emv3.com", "_blank", "tumblr.com", "xxxx"];
var matchfound = -1;
const MAX_MESSAGE_LENGTH = 10240;
let msgFolder = message.folder;
let msgUri = msgFolder.getUriForMsg(message);
let messenger = Cc["@mozilla.org/messenger;1"].createInstance(Ci.nsIMessenger);
let streamListener = Cc["@mozilla.org/network/sync-stream-listener;1"].createInstance(Ci.nsISyncStreamListener);
messenger.messageServiceFromURI(msgUri).streamMessage(msgUri, streamListener, null, null, false, "", false);
let sis = Cc["@mozilla.org/scriptableinputstream;1"].createInstance(Ci.nsIScriptableInputStream);
sis.init(streamListener.inputStream);
let rawbody = sis.read(MAX_MESSAGE_LENGTH);
for (let listidx = 0; listidx < mylist.length; listidx++) {
  //Components.utils.reportError("Checking " + mylist[listidx] + " in " + message.subject); 
  matchfound = rawbody.search(mylist[listidx]);
  if (matchfound>0) {
    Components.utils.reportError("Matched " + matchfound + " " + mylist[listidx] + " in " + message.subject); 
    break;
  }
}
(matchfound>0)

Do you see the "let mylist =" line? That's a Javascript array. You can fill it with text strings to search. Do you see the MAX_MESSAGE_LENGTH=10240? that's how far from the start of the message this code will search. Usually 10K is enough as spam message larger that that include images or other attachments.

Close the edit window with OK.

Define your actions (move, delete, flag, etc.).

Try to run it.

If you have enabled the debug console in Thunderbird, you can see a list of matches there (this is not the normal filter log).

A final note, this script does not decode Base64 (or any other) encoding. A message Base64-encoded will not match anything.

Other note, while briefly browsing the Thunderbird source code, I think the Bayesian filter has access to the raw message body, however I don't know if that means anything to you.

So, for a better answer, your options are:

  • Write a plugin yourself.
  • Ask a plugin author to add support for raw body access (the FiltaQuilla author seems like a nice person, you may ask in the FiltaQuilla forums).

MV.

Posted 2012-03-21T17:51:44.843

Reputation: 381

0

Any emails you receive in HTML format will contain the Content-type:text/html specification in the actual body of the message.

So change your filter to the following:

"Content-Type" "contains" "text/html"

This will now allow you to filter the emails sent in HTML.

To filter messages for an imap account, account needs to be set up for offline use.

In order to set up the accounts for offline use:

  • Go to Account settings > Offline & diskspace.
  • Tick Make the messages in my inbox available when I am working offline.
  • Then select the ad hoc folders after clicking on Select folders for offline use.

Hope this helps.

Ahmed Bilfaqih

Posted 2012-03-21T17:51:44.843

Reputation: 1 844

This doesn't work better. And I can't see why this could. – Gregory MOUSSAT – 2012-06-27T16:15:21.603

Give this a try "Body contains Content-type: text/html". – Ahmed Bilfaqih – 2012-06-27T16:37:12.303

I said this doen't work. So I tested. This just delete 90% of ham in addition to 90% of spam. – Gregory MOUSSAT – 2012-06-28T22:44:50.383

You may find this useful.

– Ahmed Bilfaqih – 2012-07-01T07:25:50.030