I'm reading OWASP's Secure Coding Practices Checklist and under their "Input Validation" section they have an item that reads:
If any potentially hazardous characters (
<>"'%()&+\\'\"
) must be allowed as input, be sure you implement additional controls like output encoding. Utilize canonicalization to address double encoding or other forms of obfuscation attacks.
- What is "output encoding", and can someone provide a concrete example of how a validation routine could make use of it?
- What is "double encoding", and why is it an "obfuscation attack"?
- What is "canonicalization" and why does it prevent against double encoding?
For the third one, I found a rather vague definition for canonicalization provided by OWASP: The reduction of various data encodings to a single, simple form. But that definition doesn't really help me make sense of what they're talking about.
I'm strong with Java and Python but could follow an example in any language. I'm just trying to visualize what they're talking about here and am having a tough time seeing the "forest through the trees." Thanks in advance!