Should a use utf-8 or "utf-8" as a charset value in an email header?

4

When sending an email to an Outlook.com with forwarding turned on, I find that the forwarded mail is rejected.

On examining the sent mail and the mail sitting in Outlook's inbox. I find that Microsoft have essentially re-written parts of the mail body.

For example

This is a multi-part message in MIME format.
--=_5226908e44ebc0462f06052400644d2f
Content-Type: multipart/alternative;
 boundary="=_926d2a45bc543e1972443c87118fa61a"

--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=utf-8

SGF2aW5nIGFub3RoZXIgZ28gYXQgZm9yd2FyZGluZyBhbiBlbWFpbCB2aWEgT3V0bG9vay4NCg0K
DQo=
--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/html; charset=utf-8

Becomes as follows; note the quotes around the charset value:

--=_5226908e44ebc0462f06052400644d2f
Content-Type: multipart/alternative;
boundary="=_926d2a45bc543e1972443c87118fa61a"

--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset="utf-8"

SGF2aW5nIGFub3RoZXIgZ28gYXQgZm9yd2FyZGluZyBhbiBlbWFpbCB2aWEgT3V0bG9vay4NCg0K
DQo=
--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/html; charset="utf-8"

Now aside from the fact that the mail RFC’s specifically forbid modifying the body anyway (which breaks the DKIM signature) I have to ask which is the correct way to write charset=utf-8 in an email header?

Ravenstar68

Posted 2019-01-29T16:01:47.330

Reputation: 43

Outlook and Exchange do not retain the original E-Mail; this is well-defined behavior. – Daniel B – 2019-01-29T16:25:42.940

Answers

3

RFC2045 provides in section 5.1 the grammar used to construct valid Content-Type headers in MIME messages:

5.1.  Syntax of the Content-Type Header Field

   In the Augmented BNF notation of RFC 822, a Content-Type header field
   value is defined as follows:

     content := "Content-Type" ":" type "/" subtype
                *(";" parameter)
                ; Matching of media type and subtype
                ; is ALWAYS case-insensitive.

     type := discrete-type / composite-type

     discrete-type := "text" / "image" / "audio" / "video" /
                      "application" / extension-token

     composite-type := "message" / "multipart" / extension-token

     extension-token := ietf-token / x-token

     ietf-token := <An extension token defined by a
                    standards-track RFC and registered
                    with IANA.>

     x-token := <The two characters "X-" or "x-" followed, with
                 no intervening white space, by any token>

     subtype := extension-token / iana-token

     iana-token := <A publicly-defined extension token. Tokens
                    of this form must be registered with IANA
                    as specified in RFC 2048.>

     parameter := attribute "=" value

     attribute := token
                  ; Matching of attributes
                  ; is ALWAYS case-insensitive.

     value := token / quoted-string

     token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                 or tspecials>

     tspecials :=  "(" / ")" / "<" / ">" / "@" /
                   "," / ";" / ":" / "\" / <">
                   "/" / "[" / "]" / "?" / "="
                   ; Must be in quoted-string,
                   ; to use within parameter values

Note how value is defined as token / quoted-string.

Further down in the section is a textual clarification with an example:

   Note that the value of a quoted string parameter does not include the
   quotes.  That is, the quotation marks in a quoted-string are not a
   part of the value of the parameter, but are merely used to delimit
   that parameter value.  In addition, comments are allowed in
   accordance with RFC 822 rules for structured header fields.  Thus the
   following two forms

     Content-type: text/plain; charset=us-ascii (Plain text)

     Content-type: text/plain; charset="us-ascii"

   are completely equivalent.

As you can see, quoting is not required when the value already is a token (1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials>) but valid nonetheless.

Daniel B

Posted 2019-01-29T16:01:47.330

Reputation: 40 502

Thanks I think the updated RFC is 5322 - but the comment is still relevant nonetheless. In addition Microsoft decoded the base64 text of the body and then modified that too before re-encoding it back to base64. This is in direct contravention of RFC822 and 5322 which states an SMTP server must not modify the message except to add trace headers. As a result the forwarded mail was rejected by the server. I'll have to have a moan at them. – Ravenstar68 – 2019-01-29T17:04:34.533

When forwarding the message, Outlook.com is not a relaying party. – Daniel B – 2019-01-29T17:19:23.607

Depends how you set up the forwarding. If you use Rules then it forwards in the same way as if you'd manually forwarded the message. If you use the Forwarding settings, then it acts as a relay as far as the receiving server is concerned. The original DKIM header from my server is still there and that's used by the final server to check the mail body has been unaltered. – Ravenstar68 – 2019-01-29T17:31:08.830

1

Good question. In my experience, HTML email headers are not too much different than HTML (web server) headers so I would defer to the non-quoted version like this:

Content-Type: text/html; charset=utf-8

And digging deep into the RFC (RFC 2047) for MIME encoding I found this:

2. Syntax of encoded-words

   An 'encoded-word' is defined by the following ABNF grammar.  The
   notation of RFC 822 is used, with the exception that white space
   characters MUST NOT appear between components of an 'encoded-word'.

   encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

   charset = token    ; see section 3

   encoding = token   ; see section 4

At no point does it mention whether quoted token values are valid or not. So I am going to assume that Microsoft is somehow rewriting headers to have quoted values? No clue past the evidence that was provided, but I would defer to using the unquote value instead of defaulting to whatever Microsoft is doing.

JakeGould

Posted 2019-01-29T16:01:47.330

Reputation: 38 217

1

That’s not an encoded word. The relevant RFC is 2045, which further refers to RFC 822.

– Daniel B – 2019-01-29T16:30:59.157

@DanielB Fair enough. Upvoted your answer which is clearly far more on point. – JakeGould – 2019-01-29T19:42:28.500