1

A user in our Korean Office has issue with the encoding of the characters in their emails. When sending or receiving, everything is displayed fine the first time. When re-opening an email later, the Korean characters are shown garbled with lots of </span> tags in it.

From what I can tell, between when it lands in the inbox and when the email is viewed later, the encoding becomes corrupted.

Does anyone know a fix for this?

Examples:

First example, with redacted personal information

Second example, with redacted personal information

Esa Jokinen
  • 43,252
  • 2
  • 75
  • 122
markus1985
  • 11
  • 5
  • Can you provide us with a sample of the _garbled_ text? – Esa Jokinen May 02 '17 at 09:07
  • Hey @EsaJokinen, I added some images to my post. There's a third but I'm not allowed to add more than 2 links. – markus1985 May 02 '17 at 09:19
  • That helps in investigating your issue. If you can access the original email with all headers, `Content-Transfer-Encoding:` and `Content-Type:` would be nice, too. – Esa Jokinen May 02 '17 at 09:25
  • I checked the source for the email (I believe that's what you meant) and found the following that seemed relevant in the header: ` – markus1985 May 02 '17 at 09:54
  • That's a `meta` tag from HTML version of the e-mail, which is part of the e-mail _contents_. Then there are _email headers_. In Outlook you can find them by opening the message in new window: _File_ / _Properties_ has field _internet headers_. – Esa Jokinen May 02 '17 at 10:09
  • Here we are: `Content-Type: application/ms-tnef; name="winmail.dat"` `Content-Transfer-Encoding: base64` I've pasted the full thing here: https://pastebin.com/aGyuh6pQ – markus1985 May 02 '17 at 10:38
  • Also, your example pictures have two different emails. While I have some expertise in character encodings, I can't unfortunately (yet) read hangul. – Esa Jokinen May 02 '17 at 10:41
  • Hello Esa, the first has issues in the body while the second only in the subject line. Unfortunately the second (with the broken subject) is sent as an attachment, and doesn't seem to include the _internet headers_. – markus1985 May 02 '17 at 11:03
  • I had speculation that the problem could be only in HTML message body, not affecting the plain text message body within the same message. Yet, I haven't got any proof of that and I cannot reproduce the situation in Outlook [TNEF](https://en.wikipedia.org/wiki/Transport_Neutral_Encapsulation_Format) messages having UTF-8 encoded HTML body. This combination seems to be a well supported way to encode both _hangul_ and _mixed content_ (_hangul_ + _hanja_). – Esa Jokinen May 02 '17 at 11:14
  • Is there anything I can do to help you find that proof? – markus1985 May 02 '17 at 11:55
  • Having picture of the same message while working and while not might help. Usually these are investigated in hex editor, so using only picrures there's always some guessing involved. – Esa Jokinen May 02 '17 at 12:07
  • Unfortunately the message doesn't work at all anymore, so I don't think that's an option. :( – markus1985 May 02 '17 at 12:49
  • Outlook and Exchange server versions would also help in narrowing the problem. I've written first version of my answer based on the details I've got so far. – Esa Jokinen May 02 '17 at 16:58
  • Thank you Esa! Versions are as follows: Outlook Professional Plus 2010 14.0.7180.5002 (32-bit) Exchange Server 2010 14.03.0319.002 – markus1985 May 03 '17 at 06:45

1 Answers1

1

Because the email headers had Content-Type: application/ms-tnef; name="winmail.dat" this is a sender side issue. Sending Rich Text messages using TNEF to Internet Users should be prevented as it is proprietary: receiving clients may not recognize the winmail.dat file or the content of the message may be changed during transport. Here, the problem is probably the changed content. The best practice would be to (check your own settings and) inform the sender to check their settings.

It's strange that your example seems to have HTML in it, while winmail.dat should rather be in RTF-like format. Nevertheless, what probably happens is that when content gets changed or misinterpreted during the transport, UTF-8 < from the HTML tag gets mixed up with a hangul / hanja character. If the result is something from unknown character range, the replacement character is shown instead.

I was finally able to (almost) reproduce the situation (but not the outcomes) with the similar content for further investigation by moving a Rich Text format mixed content (hangul + hanja) message from Exchange Server's Sent Items folder into an IMAP mbox, where I had it as is.

The message is in Content-Type: multipart/mixed; and has two parts.

  • The first part is in UTF-8 encoded plain text format:

    ------=_NextPart_000_0053_01D2C361.2480F1C0
    Content-Type: text/plain;
        charset="utf-8"
    Content-Transfer-Encoding: 8bit
    
  • And the other in this troublesome TNEF format:

    ------=_NextPart_000_0053_01D2C361.2480F1C0
    Content-Type: application/ms-tnef;
        name="winmail.dat"
    Content-Transfer-Encoding: base64
    Content-Disposition: attachment;
        filename="winmail.dat"
    

The first part of this multipart MIME message is working just fine. I suppose that the first preview of your user's message was working because for some reason it showed this text/plain part and then switched to this (misinterpreted and malformed) Rich Text or HTML format later.

As a workaround, it is possible to force Outlook to always stay in plain text mode:

  1. File > Options
  2. Trust Center > Trust Center Settings...
  3. Email Security > [x] Read all standard mail in plain text
Esa Jokinen
  • 43,252
  • 2
  • 75
  • 122