When and why does bypassing XSS sanitizers with double encoding work?

Question

I have read in different sites about different ways to bypass XSS sanitizers, like double encoding special characters such as /<>:"', or employing different encoding schemes.

However most of these sites do not explain why these attacks may work and in which cases. For instance I know modern browsers URL-encode special characters, while cURL doesn't do that; thus you cant create a PoC using cURL (or Burp Proxy).

Can someone give a very detailed explanation on how browsers encode and handle input data (both POST and GET) and how a typical (PHP) web application handles these? From my little experience in PHP (while I was in university) I was just accessing parameters like $_POST["query"] without even thinking about encoding, security practises, etc.

cURL won't automatically URL-encode, but nothing stops you from doing it manually. When you access the parameters on the server side, they are already decoded in most cases. — multithr3at3d, May 08 '18 at 17:54
The differnce between browsers and cURL is a separate issue from how a double encoding attack works. — Anders, May 08 '18 at 18:54
https://alf.nu/alert1 Play the game, be amazed. If you can get *at least* to level 10, you'll probably realize how terrifyingly hard XSS protection is :) Many of the tasks require you to exploit double encoding, and it's pretty scary. — Luaan, May 09 '18 at 07:57

score 35 · Accepted Answer · answered May 08 '18 at 18:49

Lets look at this example payload (A), encoded once (B) and twice (C):

A. <script> alert(1) </script>
B. %3Cscript%3E alert(1) %3C%2Fscript%3E
C. %253Cscript%253E alert(1) %253C%252Fscript%253E

Double encoding can be used to bypass XSS filters when different parts of the applicaition makes different assumptions about if a variable is encoded or not. For instance consider the following vulnerable code:

$input = htmlentities($_GET["query"]);
echo urldecode($input);

This code would block the payload if it was just single encoded (as in B). PHP URL decodes your GET variables for you by default (turning B into A), so < and > would be passed to htmlentities that neutralizes them. However, if you instead send in C, it would be URL decoded to B that would pass through htmlentities unchanged. Since it is URL decoded again before it is echoed, it turns into the dangerous payload A.

So the bug here is that there is another layer of URL decoding after the XSS filter. When the two lines are next to each other like this, the problem is quite obvious. But these two things can be in separate modules making it hard to detect. Since it's hard to keep track of what strings are URL encoded, it is tempting to just throw in an extra decoding to be sure - after all, it usually doesn't affect unencoded data.

The PHP manual actually warns about this:

Warning: The superglobals $_GET and $_REQUEST are already decoded. Using urldecode() on an element in $_GET or $_REQUEST could have unexpected and dangerous results.

In my opinion the manual is not cautious enough here - decoding any untrusted data after filtering for XSS is dangerous, no matter where it comes from. Be extremely careful with modifying your data after you have filtered it!

For more reading, see OWASP.

Clearly simply decode all data in a loop until it reaches a steady state, then sanitize, then repeat until that reaches a steady state. ;) — Yakk, May 08 '18 at 20:17
@Yakk TODO: find an encoding where this process does not terminate :-) — Bergi, May 08 '18 at 20:39
@Bergi I feel like this might be a good challenge for CodeGolf.SE — Chris Cirefice, May 09 '18 at 09:40
Since URL decoding will either leave the string unchanged or decrease its leangth, there is no such payload. But I like @Bergi's instincts. :-) — Anders, May 09 '18 at 09:44

When and why does bypassing XSS sanitizers with double encoding work?

1 Answers1

Linked