I got into a (somewhat heated) discussion with my colleague today about what characters our application should accept. This was prompted by the discovery that you can enter anything in the search box and the application will dutifully perform a search by that string. However this applies equally to all the textboxes in the application, not just the search box.
My colleague is of the opinion that the best practice (from a security viewpoint) is to limit the allowed characters to some letters, digits, and a subset of symbols. This prevents the user entering all kinds of unprintable Unicode control characters and whatnotelse.
I on the other hand am of the opinion that this will only annoy the users and not offer any additional security. I think that the best practice is to make your application accept anything, and then use the proper encoding functions (and parametrized queries if they are available) to make sure that the entered string passes through unmodified and is displayed/used as entered. If the user enters garbage, he will see garbage, but the system will work correctly.
What is the industry best practice here?
Added: It seems that I've not been very clear. The question is about server side, and the assumption is that all the proper encodings/escapings are in place when using the string (e.g. using parameters for SQL, HtmlEncode for outputting to HTML, etc). Given all that, does it still make sense to limit allowed characters which arrive from the client?