How can I convert HTML emails to plain text with fetchmail?



I recently set up an email gateway for our bug tracker, which was originally intended as a way for streamline error reporting from our server software. I told my colleagues about it, and they were also happy to have this feature, but I was horrified to discover the abuse of my poor system inflicted by entourage/outlook emails.

First, the sender's email address appears horribly mangled, like 'Name =?ISO-8859-1?B?TGp1bmdzdHL2bQ==?=" '. The body of the email is an HTML attachment, of course with an unnecessary amount of extra code. The attachments are particularly annoying, since they appear as ticket attachments in the bug tracker and the body of the ticket is empty.

I have done a bit of googling but only found solutions suggesting huge ugly awk or perl scripts, which seems neither maintainable nor robust enough to address all possible edge cases for outlook's HTML.

What is a better solution here?

Our target platform is windows server, and I would prefer something in python, but we have a cygwin installation and can therefore use other unix utilities if need be.

Nik Reiman

Posted 2009-08-06T08:19:55.393

Reputation: 511



If you research your question with fetchmail in mind you won't find good answers. That is because fetchmail is not the tool for your job.

As the Fetchmail FAQ says:

Repeat after me: fetchmail's job is transport, not policy.

What most people do in such cases is to use fetchmail together with procmail. Easiest thing to do would be to pipe you messages through html2txt like explained here.

If you never worked with procmail, don't be afraid. procmail is horrible, but if you keep things simple it's not too bad.

Ludwig Weinzierl

Posted 2009-08-06T08:19:55.393

Reputation: 7 695


fetchmail is only for fetching mail, just like its name says.

On Unix systems, most people use procmail for email processing. You can write a recipe that checks for /<html/i and pipes the message through w3m -dump (or lynx -dump or anything you want). I'm not sure if a similar thing exists for Windows though...

(I'd also make procmail reply with a tutorial on turning off HTML.)


Posted 2009-08-06T08:19:55.393

Reputation: 283 655