Any data coming from outside your system is crossing a "trust boundary", and needs to be validated inside your system. That means server-side input validation is required.
Performing input validation means to check your input to be sure you can process it safely. The tricky part is that by validating it, you're already doing some minor processing, and that can create a hidden vulnerability. So there's a specific order to validating input.
The first step is to validate the length of the input. Most input will temporarily land in a finite buffer. Ensure that the amount of input you copy into the buffer doesn't exceed the size of the buffer - if there's too much input and not enough buffer, it creates a classic "buffer overflow" problem; exploiting these is a staple of hackers.
The next step is to ensure the data is in the format you expect. If you're expecting a number, ensure the bytes contain only digits and only the number symbols you permit, such as plus, minus, separator, decimal point, currency symbol, etc. Note that these are locale specific: in the US, a million dollars could be entered as $1,000,000.00
, while in Germany a million Euros could be entered as 1.000.000,00€
. If you're expecting alphanumeric characters and numbers, use an "approved-list" to accept only the characters you expect.
It's safer to rely on an approved-list of good characters than a deny-list of bad characters, because attackers will learn new attacks in the future. It's possible an unexpected character will permit an injection attack tomorrow that we didn't know about today.
Note that if you reverse these checks and test for special characters before checking the input length, your validation code might be vulnerable to a buffer overflow. That's why it's important to do them in the proper order.
It seems intuitive that input checking should be used to protect against injection attacks (a SQL injection is an attacker entering something bad like ' OR 1=1;DROP TABLE STUDENTS--
), but that's not always possible. Someone might try to prevent this injection by putting the apostrophe in the deny-list, but an apostrophe is often valid data, such as in the name O'Brian. Plus an attacker can often work around approved-lists with another strategy like URL encoding. So we add another line of defense in the code that interfaces with SQL. That code needs to be responsible for executing the queries as safely as possible. This could be using parameterized SQL queries, ORMs, or other defensive strategy. That way if attackers figure out a way around the approved-list, the parameterized SQL should still stop them.
Injection attacks aren't limited to SQL, either. Attackers will try injecting path separator characters into file names, shell delimiters like pipes (|
), XML delimiters, URLs, etc.; anything that you accept can be subjected to abuse. Any code that interprets user input needs to be written to avoid such problems.
The step after validation is to encode the input in order to protect the output. For example, if you're going to accept <
and >
and later output the results on a web page, you'll want to make sure you're HTML encoding the symbols so you don't inadvertently create a hole where an attacker can plant a <script>attack!</script>
on the output page.