Lets look at this example payload (A), encoded once (B) and twice (C):
A. <script> alert(1) </script>
B. %3Cscript%3E alert(1) %3C%2Fscript%3E
C. %253Cscript%253E alert(1) %253C%252Fscript%253E
Double encoding can be used to bypass XSS filters when different parts of the applicaition makes different assumptions about if a variable is encoded or not. For instance consider the following vulnerable code:
$input = htmlentities($_GET["query"]);
echo urldecode($input);
This code would block the payload if it was just single encoded (as in B). PHP URL decodes your GET variables for you by default (turning B into A), so <
and >
would be passed to htmlentities
that neutralizes them. However, if you instead send in C, it would be URL decoded to B that would pass through htmlentities
unchanged. Since it is URL decoded again before it is echoed, it turns into the dangerous payload A.
So the bug here is that there is another layer of URL decoding after the XSS filter. When the two lines are next to each other like this, the problem is quite obvious. But these two things can be in separate modules making it hard to detect. Since it's hard to keep track of what strings are URL encoded, it is tempting to just throw in an extra decoding to be sure - after all, it usually doesn't affect unencoded data.
The PHP manual actually warns about this:
Warning: The superglobals $_GET
and $_REQUEST
are already decoded. Using urldecode()
on an element in $_GET
or $_REQUEST
could have unexpected and dangerous results.
In my opinion the manual is not cautious enough here - decoding any untrusted data after filtering for XSS is dangerous, no matter where it comes from. Be extremely careful with modifying your data after you have filtered it!
For more reading, see OWASP.