Can a mismatched server encoding on HTTP POST or GET result in a security issue?

Question

It is possible for a server to parse HTTP POST and GET data with a fixed encoding or one that is dynamic with the client's response.

Consider the situation where a client uses UTF7,32 or any other encoding other than UTF8. The server is then hard coded to only use UTF8 when reading the data.

Question

What types of security issues can occur when a mismatched encoding (from server to client) or decoding (client to server) occur?
If this is a vulnerability, what kind of tools scan for this in Eclipse or Visual Studio?

score 4 · Accepted Answer · answered Dec 05 '13 at 10:48

Request from client to server

Assuming there are no vulnerabilities in the UTF-8 decoder on the server (e.g. overlong UTF-8 encoding or illegal UTF-8 continuation bytes) that could bypass a filter or other encoding routine I can't imagine any sequence that could be decoded to anything dangerous in itself. It is not like UTF-7 where a sequence such as +ADw- can bypass a filter or encoder as they don't understand the encoding.

This of course is assuming that the UTF-8 is decoded and then reencoded for other backend components that may be vulnerable (e.g. a backend process that decodes in UTF-7 could be vulnerable if the raw bytes are sent from the front-end).

Response from server to client

Again, if the client UTF-8 decoder is free from vulnerabilities itself this should be safe. This is assuming the client is interpreting the server response as UTF-8 encoding - I wasn't sure from your question whether the client sending UTF-7 was also expecting UTF-7 in the response. If this was the case then yes it could be vulnerable to XSS via unencoded tags passed from the server. If not then another thing that could throw a spanner in the works are browser plugins (e.g. Flash/Silverlight) but these would normally either get their input from the browser once the text had been decoded, or from the server directly (in which case your pre-condition of only reading in UTF-8 across the network would still apply).

score 2 · Answer 2 · answered Dec 05 '13 at 22:51

(SilverlightFox's answer is great. Consider my response below a related note rather than an answer in its own right.)

MSDN warns against running untrusted input through the UTF7Encoding class. Recent versions of ASP.NET have actually dropped support for UTF-7 since no legitimate client will send data to a web server using this encoding. In these scenarios, the request is treated as if it were UTF-8.

For server-to-client communications, one good way to mitigate this is to use UTF-8 and to verify that the response's Content-Type header always contains charset=utf-8. Most web frameworks should do this automatically on your behalf.

score 1 · Answer 3 · answered Dec 06 '13 at 09:12

Output

When a unicode output is translated to an 8-bit character set, sometimes it is done with a "best efforts" conversion. Characters that don't have an exact match are converted to something similar, so maybe "a with circumflex" becomes "a". This can be extremely dangerous for security. There is a unicode character "half-width less than sign". Browsers do not recognise this as the start of a tag, so it is not usually escaped. However, on a best efforts conversion it may be translated to a regular < and this can cause an XSS flaw. This isn't just a theoretical concern; I have seen this in the wild. Some info here.

In most cases, the best solution is to use utf-8 everywhere. If this is not possible, you should do a strict conversion, rather than best efforts. And if that isn't possible, then you must do the best efforts conversion BEFORE you do any escaping.

Input

There is a very simple rule to avoid problems: decode before validate. Whatever character set you get it (or URL encoding, etc.) - decode it fully before you validate or do any operations on the data. If you follow this rule, you should be good, even if there are flaws in your decoding (e.g. accepting overlong utf-8 sequences).

Can a mismatched server encoding on HTTP POST or GET result in a security issue?

3 Answers3

Request from client to server

Response from server to client