Mail character encoding formatting

0

Every now and then, I get an email that is not formatted properly, as in it contains many '=92' and '=' characters:

We are looking for candidates to join our team.    Great qualifications inc=
lude:

*     PhD or Masters specializing in Machine Learning, Statistics, or related fi=
elds.

=B7     Experience dealing with large, real-life data sets. (not just pre-c=
anned problems).

Why would this occur? A buggy sender email client? Wrong MIME encoding?

notnoop

Posted 2009-10-06T13:55:43.520

Reputation: 915

As mentioned by harrymc: the equal-signs are in fact "quoted-printable." So either the sender is not including the correct headers, or the recipient is not interpreting the headers correctly. We'd need to see some more "headers" from the email to determine the cause for this specific question. (Small chance that some virus scanner or some intermediate server messed up the source of the message.) – Arjan – 2009-10-06T14:32:25.770

Answers

2

The problem is maybe split between the sending and receiving email programs.
It's certain that the sender of the email didn't see such a mess when he sent his email. The problem relates to how the actual encoding used by the sender is declared in the headers part of the email.

The basic problem is that there are too many characters out there for them all to be expressed using only the simple ascii latin character set. The final solution is supposed to be Unicode, whose declared purpose is to contain all the world's character sets (which is already impossible). There are also intermediate solutions, such as quoted-printable which is probably what we see in your question.

Now for each character set (except possibly Unicode) there are several independent implementation by each email client, to which you also add the implementation of the email headers.

The result is that to see the exactly same depiction of the email text is only guaranteed if the same email client software is used for both sender and receiver. Especially to blame is Outlook, which does not respect too much the international standards, and is therefore liable to generate emails that other clients might have difficulties in displaying in identical manner.

To this mess you should add the fact that different operating systems may give different numerical values to the same characters. For example, between the Mac and the PC there isn't an agreement on the numerical value of even a single accented character.

This article amy also be interesting for you : Character encoding in e-mail: having to deal with GroupWise crap in 2004, and may show you similar problems that other people are experiencing.

harrymc

Posted 2009-10-06T13:55:43.520

Reputation: 306 093