22

If I use fully parameterized queries everywhere, is it still necessary and/or security-relevant to somehow sanitize input? E.g. check that mail addresses are valid before sending a parameterized query against the database, or filtering out certain special characters from text?

I think of benign, but not-so-well-engineered 3rd party tools (maybe self-written scripts by the admin, or some fancy CrystalReports made by a non-technician) trying to consume unsanitized data from our database.

Right now, we have full unicode support against SQL Server (MySQL seems to have problems with Emojis), I'm not sure how to filter out security risks without losing that property.

Alexander
  • 2,143
  • 2
  • 16
  • 22

3 Answers3

25

No, it's not necessary. But please, read on.

Input sanitization is a horrible term that pretends you can wave a magic wand at data and make it "safe data". The problem is that the definition of "safe" changes when the data is interpreted by different pieces of software.

Data that may be safe to be embedded in an SQL query may not be safe for embedding in HTML. Or JSON. Or shell commands. Or CSV. And stripping (or outright rejecting) values so that they are safe for embedding in all those contexts (and many others) is too restrictive.

So what should we do? Make sure the data is never in a position to do harm.

The best way to achieve this is to avoid interpretation of the data in the first place. Parametrized SQL queries is an excellent example of this; the parameters are never interpreted as SQL, they're simply put in the database as, well, data.

For many other situations, the data still needs to be embedded in other formats, say, HTML. In that case, the data should be escaped for that particular language at the moment it's embedded. So, to prevent XSS, data is HTML-escaped at view time. Not at input time. The same applies to other embedding situations.

So, should we just pass anything we get straight to the database?

Maybe. It depends.

There are definitely things you can check about user input, but this is highly context-dependent. Because sanitization is ill-defined and mis-used, I prefer to call this validation.

  • For example, if some field is an supposed to be an integer, you can certainly validate this field to ensure it contains an integer (or maybe NULL).
  • You can certainly do some validation on email fields (although some people argue there's not much you can do besides checking for the presence of a @, and they have a good point).
  • You can require comments to have a minimum and maximum length.
  • You should probably verify that any string contains only valid characters for its encoding (e.g., no invalid UTF-8 sequences).
  • You could restrict a username to certain characters, if that makes sense for your userbase.
  • A minimum length for passwords is, of course, incredibly common.

As you can see, these checks are very context-dependent. And all of them are to help increase the odds you end up with data that makes sense. They are not to protect your application from malicious input (SQL injection, XSS, command injection, etc), because this is not the place to do that.

Users should be free to type '; DROP TABLE users; -- without their post being rejected or mangled to \'; DROP TABLE users; --. Note that I'm able to include such "malicious" content on sec.SE!

So, to answer your original question:

... is it still necessary and/or security-relevant to somehow sanitize input?

No, it is not. But please do properly escape the data where needed before outputting it. And consider validations, where applicable.

I think of benign, but not-so-well-engineered 3rd party tools (maybe self-written scripts by the admin, or some fancy CrystalReports made by a non-technician) trying to consume unsanitized data from our database.

Then escape or filter the data before outputting to those tools, but don't mangle data in your database.

But really, those scripts should be fixed, or rewritten with security in mind.

(MySQL seems to have problems with Emojis)

Little off-topic, but have a look at the utfmb4 charset for MySQL ;)

marcelm
  • 888
  • 7
  • 12
  • To add to the problem with sanitizing, if you write a sanitizer that strips out e..g HTML script tags, did you remember to check if any new script tags appeared in the "cleaned" version? IIRC that led to a real web vulnerability; they should have rejected the problematic input instead of trying to clean it. – Carl Walsh Oct 27 '17 at 21:17
  • 1
    "A minimum length for passwords is, of course, incredibly common." Your hashing algorithm should take care of this for you, in terms of validating the length of the string being inserted into the database ;-) – Tim Oct 27 '17 at 23:28
  • 1
    Problem with writing everything back to the database is that another system that isn't very secure may consume that data. For example your bank website allows you to enter a nice redirect script tag when you change your address, okay next time you log into the website it will either just redirect you or show as plain text, so no harm done as it won't affect other customers. What about the system that the call centre uses when you phone up? Would that redirect them to a hostile site when they view your account? – Jay Apr 22 '20 at 07:30
9

If I use fully parameterized queries everywhere, is it still necessary and/or security-relevant to somehow sanitize input?

Yes. It's always a good idea to sanitize the input before sending it to the database.

Parameterized queries might save you from SQL injection attacks, but might not prove beneficial in case of stored XSS attacks. If a user sends a malicious javascript code into your form, and you store it successfully in your database, and you display the same field elsewhere, the malicious script might run on the victim's browser.

It's always a good idea to sanitize the input before sending it ahead, and sanitizing the output before sending it to client's browser.

EDIT: As pointed out in comments, sanitization is different from encoding/escaping values. Sanitization, here, meant that the inputs can be validated for some basic checks before passing it on to the database(or any further layer). For example, a field which expects an e-mail address can be validated for a valid e-mail address, a field for age can be validated for integer only, etc. Regular expressions prove to be of immense help here. It should be noted that these checks should be placed on front-end(client-side) as well as back-end.

In case of fields which expect rich text, you can implement Content Security Policy on pages wherever the user input will be displayed. You can also use an HTML sanitizer to sanitize the user input.

pri
  • 4,438
  • 24
  • 31
  • 14
    **NO** you don't store the data escaped in the database. What happens if you store ` – RobIII Oct 27 '17 at 15:50
  • 2
    @RobIII, all good points. One understanding of sanitization is that it is different from escaping: it might mean rejecting inputs that aren't valid (e.g., contain characters that aren't on a whitelist) rather than escaping the inputs. It typically makes sense to do input validation near the input, and output escaping near the output. Hard to tell which of those two this answer was referring to. – D.W. Oct 27 '17 at 15:58
  • Also, what you're talking about isn't so much 'sanitization' but 'encoding'. Sanitization is ensuring that value X, which is expected to be a number (e.g. integer) isn't somehow a string. Or that an e-mailadress contains an `@`. Or that a phonenumber contains only valid chars (whatever you define to be 'valid'). Or that the 'volume' value is between `0` and `10` for example that, when someone submits `11` or `-176` an error is displayed instead of saving the value. **Edit:** So basically what D.W. just said and what was answered [here](https://security.stackexchange.com/a/172307/3992). – RobIII Oct 27 '17 at 15:59
  • Thanks for your constructive inputs. I didn't mean encoding in my answer. As D.W. correctly notes, sanitization might not always mean escaping. What I meant was that the input can be passed through some validations so as to ensure that it is not different in nature than expected. I've edited my answer :) – pri Oct 27 '17 at 17:03
  • For XSS, you'll want to sanitize *OUTPUT*. – Mark Buffalo Oct 27 '17 at 17:23
  • @MarkBuffalo ...and sanitizing the output before sending it to client's browser :) – pri Oct 27 '17 at 17:25
  • 1
    @PriyankGupta I don't think Input Sanitation and Input Validation are the same thing. Input sanitation is the process of replacing/removing certain characters, which is almost always universally bad. – Mark Buffalo Oct 27 '17 at 17:27
3

Yes, you should always sanitize input data. Sanitation isn't just about protecting you from injection, but also to validate types, restricted value (enums), ranges, etc.. While an attacker might not be able to manipulate your sql, they can still cause undesired behavior in the rest of your application.

For example, if an attacker changed an enum value, could they manipulate the system? Did they just change a role, user type, etc... ? Alternatively they could input a value larger than can be accepted in your schema or the data type (byte v int32, int64), could this cause the application to crash that exposes information or leaves orphaned data changed (not transacted).