No, it's not necessary. But please, read on.
Input sanitization is a horrible term that pretends you can wave a magic wand at data and make it "safe data". The problem is that the definition of "safe" changes when the data is interpreted by different pieces of software.
Data that may be safe to be embedded in an SQL query may not be safe for embedding in HTML. Or JSON. Or shell commands. Or CSV. And stripping (or outright rejecting) values so that they are safe for embedding in all those contexts (and many others) is too restrictive.
So what should we do? Make sure the data is never in a position to do harm.
The best way to achieve this is to avoid interpretation of the data in the first place. Parametrized SQL queries is an excellent example of this; the parameters are never interpreted as SQL, they're simply put in the database as, well, data.
For many other situations, the data still needs to be embedded in other formats, say, HTML. In that case, the data should be escaped for that particular language at the moment it's embedded. So, to prevent XSS, data is HTML-escaped at view time. Not at input time. The same applies to other embedding situations.
So, should we just pass anything we get straight to the database?
Maybe. It depends.
There are definitely things you can check about user input, but this is highly context-dependent. Because sanitization is ill-defined and mis-used, I prefer to call this validation.
- For example, if some field is an supposed to be an integer, you can certainly validate this field to ensure it contains an integer (or maybe NULL).
- You can certainly do some validation on email fields (although some people argue there's not much you can do besides checking for the presence of a
@
, and they have a good point).
- You can require comments to have a minimum and maximum length.
- You should probably verify that any string contains only valid characters for its encoding (e.g., no invalid UTF-8 sequences).
- You could restrict a username to certain characters, if that makes sense for your userbase.
- A minimum length for passwords is, of course, incredibly common.
As you can see, these checks are very context-dependent. And all of them are to help increase the odds you end up with data that makes sense. They are not to protect your application from malicious input (SQL injection, XSS, command injection, etc), because this is not the place to do that.
Users should be free to type '; DROP TABLE users; --
without their post being rejected or mangled to \'; DROP TABLE users; --
. Note that I'm able to include such "malicious" content on sec.SE!
So, to answer your original question:
... is it still necessary and/or security-relevant to somehow sanitize input?
No, it is not. But please do properly escape the data where needed before outputting it. And consider validations, where applicable.
I think of benign, but not-so-well-engineered 3rd party tools (maybe self-written scripts by the admin, or some fancy CrystalReports made by a non-technician) trying to consume unsanitized data from our database.
Then escape or filter the data before outputting to those tools, but don't mangle data in your database.
But really, those scripts should be fixed, or rewritten with security in mind.
(MySQL seems to have problems with Emojis)
Little off-topic, but have a look at the utfmb4
charset for MySQL ;)