Should a website limit characters that can be entered in its fields?

Question

I got into a (somewhat heated) discussion with my colleague today about what characters our application should accept. This was prompted by the discovery that you can enter anything in the search box and the application will dutifully perform a search by that string. However this applies equally to all the textboxes in the application, not just the search box.

My colleague is of the opinion that the best practice (from a security viewpoint) is to limit the allowed characters to some letters, digits, and a subset of symbols. This prevents the user entering all kinds of unprintable Unicode control characters and whatnotelse.

I on the other hand am of the opinion that this will only annoy the users and not offer any additional security. I think that the best practice is to make your application accept anything, and then use the proper encoding functions (and parametrized queries if they are available) to make sure that the entered string passes through unmodified and is displayed/used as entered. If the user enters garbage, he will see garbage, but the system will work correctly.

What is the industry best practice here?

Added: It seems that I've not been very clear. The question is about server side, and the assumption is that all the proper encodings/escapings are in place when using the string (e.g. using parameters for SQL, HtmlEncode for outputting to HTML, etc). Given all that, does it still make sense to limit allowed characters which arrive from the client?

You do not mention the specifics of the situation, but here are my two cents: As a native of a country with a non-Latin alphabet, I'd be very annoyed if your application prevented me from e.g. typing my name in my own language. And from a business POV, it would be extremely costly in the future if you find yourselves needing to refactor everything just to provide full Unicode support for that new client from China or Saudi Arabia. Better do it right from the start... — thkala, Apr 08 '15 at 23:07
@thkala - Well, yes, this validation **does** need to be centralized, so changes can be made easily. — Vilx-, Apr 09 '15 at 06:37
A company I used to work for suggested putting a notice on every form saying "Please type in English only". Nothing like advertising your vulnerabilities! The next suggestion was to go through the entire code base and manually put a different filter on each field (first name, phone number, email, etc.). You don't want to know what the next suggestion was! — CJ Dennis, Apr 09 '15 at 12:38

Stuart Caie · Answer 1 · 2015-04-08T19:12:42.187

You shouldn't trust the client. Writing Javascript to stop characters being entered does not stop anyone from submitting them to your search.

Your search routine should remove characters it doesn't support, and when printing that back out, it should show what it actually accepted, not what was submitted.

For general purpose fields in a form, consider adding client-side validation for a more pleasant user experience; I'd rather know before I hit submit that you're not going to accept non-digits in the Phone Number field. But if I go around your Javascript checks and submit the form anyway, the back-end server should outright reject my invalid data; don't leave it to just the Javascript to sanitise input.

Also, while it's OK for most fields to accept anything the user can enter and repeat it back elsewhere, it must be escaped properly to prevent cross-site scripting (e.g. so the user can't set their name as <script src="...">). Some fields have additional concerns; for example, if you allow users to pick a unique username, Unicode equivalence could allow them to choose a unique encoding for their name that nonetheless appears identical to another user's name, allowing impersonation. Normalisation is not enough, because some characters look like other characters, still allowing impersonation. Read Unicode security considerations for more on this.

Edit: Your colleague still has a point. The server-side does not live in isolation. If it accept inputs from a user and then shows some of those inputs to other users at a later time, it's not enough that the server-side is bulletproof against dodgy inputs. There's a whole category of security issues that happen client-side, because the server-side stored data exactly as-is and blindly handed it out to other clients (who are then attacked or fooled by it). Yes, server-side needs to restrict input characters, if they'll exploit or fool your users when handed back out again.

Recommended reading:

Yeah, You should never trust what can a client do. Even that you put maxlength to the input textboxes the client can do an HTML Injection and put alot of characters. So the best is to validate through server :) , Excelent Answer Stuart! — NathanWay, Apr 08 '15 at 16:52
Sorry about the downvote, but you managed to miss the question almost entirely. :( The question was about server side, and I explicitly mentioned that the search routine **does** support everything, and all escaping is proper. The question was about best practices - even if everything else is fine, is there still a reason to limit characters? — Vilx-, Apr 08 '15 at 17:50
The link to the Unicode Security considerations however is interesting. I'll read it later. — Vilx-, Apr 08 '15 at 17:53
Also, generally your advice is sound. It's just not what I asked. :) — Vilx-, Apr 08 '15 at 18:12

tim · Accepted Answer · 2015-04-08T16:45:48.177

14

Your approach - if used correctly - would protect you against two very common attacks: SQL injection and XSS. And escaping/encoding/prepared statements are definitely a must-have and your main line of defense.

But as you specifically mention search boxes, your approach might for example not catch SQL wildcard DOS attacks (see here and here), which could be caught by input validation (server-side; you obviously shouldn't do this with client-side JavaScript).

Your approach will also not catch security bugs in your code. When your code base gets large enough, the probability that you forget proper encoding in just one place increases, so having additional protection - in the form of server-side input validation - against it is not a bad idea.

It's also not bad from a user experience point of view (if a user enters invalid data, it's good to report this back to them, so that they know what went wrong, and can now enter valid data).

The downside is that it's a lot more work. You have to actually think about what input should be allowed and what shouldn't, because as you have said, if you filter out too much, it might limit users. If you don't want to perform this work for each possible input field, a web application firewall might be an alternative.

edited Apr 08 '15 at 16:45

answered Apr 08 '15 at 16:31

tim

29,018
7
95
119

Thank you for some actual reasons! :) Btw - yes, the wildcard vulnerability is there. I had actually thought it was a feature, but the example is pretty astonishing. :) Although that can be fixed again by proper escaping. The argument about forgetting proper encoding in just one place however is solid. – Vilx- Apr 08 '15 at 17:59
That seems like a reason to use an ORM or micro-ORM instead of concatenating raw SQL, not a reason to start banning text from certain fields (which is just as easy to forget). – Casey Apr 08 '15 at 22:48
@emodendroket - No, that's not the point. Even if you do use an ORM (and we **do**), you can still forget another kind of encoding - HtmlEncode when concatenating HTML, JsonEncode when encoding JSON, etc. Also you might need to send the string to third party libraries (like creating a PDF) of which you do not know how they handle such characters. – Vilx- Apr 09 '15 at 06:36
1

@Vilx- Well, again, you can solve this problem by using well-though-out libraries to handle it. For example, Razor and Angular both escape everything they display unless you specifically ask them not to. Any competent JSON library should be doing the same thing. Trying to solve this problem by banning characters is just not a viable solution. Doubly so if we don't know what environment we're escaping for. – Casey Apr 09 '15 at 14:00
@Vilx- To my point, consider this question: http://stackoverflow.com/questions/129677/ – Casey Apr 09 '15 at 15:01
1

@emodendroket - I totally agree. This is meant to be a **safety net**, not the primary line of defense. – Vilx- Apr 09 '15 at 17:28

score 1 · Answer 3 · answered Apr 08 '15 at 16:14

1

You should implement both measures and heres why.

If you limit the character space the user can enter (a-zA-Z!@#$%^&*) for example there is less chance a unwitting user will screw up and enter some jank data. However this only really stops the end users who are trying to use your application from entering malformed data. The second step, performing proper sanitation and encoding of data will make sure you're free from XSS/SQLi and various other potential attacks.

So use both. Limit user entry to stop end users who dont know better and sanitize the data to stop those who are a little more clever.

answered Apr 08 '15 at 16:14

Ajaxasaur

466
2
7

I disagree; even if you limit the characters a user can enter, you will not prevent them from entering bad data. For example, mis-spelling "junk" as "jank". And limiting the characters in the way you suggest will prevent lots and lots of perfectly valid data inputs. – Vince Bowdren Apr 09 '15 at 16:24
Use case is extremely important, if you dont need certain characters you should limit their use. if you need them, then dont check for them. Also 'jank' wasn't a misspelling :) – Ajaxasaur Apr 13 '15 at 18:58
True; but the problem is knowing which characters really aren't needed. It's something which we get wrong, time after time. – Vince Bowdren Apr 13 '15 at 20:08
If and only if you have (bad) development practices with minimal effort on design and documentation. Spend more time designing what youre building and you wont run into this problem. – Ajaxasaur Apr 13 '15 at 20:54

score 1 · Answer 4 · answered Apr 09 '15 at 23:20

There is a lot of good advice in the answers above, but I'm not sure they have addressed the main part of your question i.e. limiting the number/size of input data.

To recap what has been stated already

Use client side input validation and feedback to improve the client user experience, but DO NOT rely on anything client side for security purposes. All client side measures are easily defeated. The client side is for the client and should be client focused
Do all security checks, validation of data, sanitising data etc at the server side. Assume the data supplied is hostile and cannot be trusted. Use well tested known solutions where possible rather than re-inventing the wheel and try to give feedback to the user if you do not allow something so that they know what is happening and can possible re-structure their input

With respect to the question about restricting the amount of data and what data is allowed and what I feel is an inaccurate interpretation of the concept of being liberal with what you accept and conservative with what you do I would suggest

There is no point in accepting data you cannot use. What you can use will depend on the limitations of the components which make up your application (database, supported character encodings, maximum buffer limits etc)
There is no point accepting data too long to fit into whatever use you have for it i.e. accepting data fields longer than the field length of your database is pointless
Consider the performance hit associated with extremely long input data. For example, if it is a search string, is there a limit at which the performance or resources consumed by extremely long queries going to adversely impact your system or end up returning unusable results?
Is there a risk that unlimited input lengths could trigger buffer overflow vulnerabilities? Are ALL the components (libraries, external systems, databases etc) able to handle arbitrary lengths of input data or will they crash, do unexpected truncation etc

Being liberal in what you accept does not mean you have to use everything you accept. It really means don't just fail or crash. It means providing feedback to the user why you cannot handle the input and catching failures so that they are managed gracefully. There is no point in arguing that you should accept everything for a good user experience if you cannot use what is provided or handle it in a reliable manner. However, you should not silently drop characters or truncate input without informing the user of the reasons or limits. Users only get frustrated when it isn't clear what is and what is not acceptable - provide clear information so that their expectations match your capability and there will be far fewer frustrated users.

score 0 · Answer 5 · answered Apr 08 '15 at 18:03

From a pure security perspective your aproach is the correct one. The web interface (html + javascript + whatever...) is just a client to your HTTP server. By other words anything / anyone can make requests by passing your client security. Therefore security at that level is basically no security.

From a GUI perspective it might actually be a good idea to limit characters with javascript/whatever. E.g.: If your field is supposed to have an age there is no reason to accept alphabetic characters. You at least prevent syntactical problems from the users. If well designed it can be a big advantage to your GUI (quick feedback to the user) and to your server (avoids requests that return validation errors).

See my update to the question. – Vilx- Apr 08 '15 at 18:12 — Vilx-, Apr 08 '15 at 18:12

Should a website limit characters that can be entered in its fields?

5 Answers5