XSS via JSON: Why does a web application not sanitize either its incoming params hash or its outgoing JSON values of malicious tags like Script?

Question

Recently working on a Rails-based web application for a company, I had to look into XSS vulnerability. It turns out that the application, in some places, could take an HTML tag (e.g., <script>jscodehere</script> directly as a parameter in GET or POST requests.

This parameter is then accessible via the application's params hash (the hash where all incoming key/value data is made available from the request).

Now, the XSS vulnerability stemmed from the fact that the site also makes available the data of many of its pages in a JSON version (e.g., /cart/1.json).

Through some mechanisms I don't fully understand (I suppose this is technically "Reflected XSS"?), the unsanitized <script> code which has then made its way into the JSON can be used to compromise personal machines and other sites, through unintended execution.

My question is, why is this not an opt-in system? Rails is now in version 4, so I'm surprised a solution still must manually be built, but this would apply to any web application framework. It's one thing to allow the params to come in unsanitized by default (perhaps the HTML tags will be used in a user's profile page and the formatting is required) – and I think Rails does some scrubbing on the actual rendering action as well, limiting output, by default, to only "safe" html tags.

However, when rendering JSON, it does no such sanitizing/scrubbing, perhaps because when the JSON response is built up and parsed it is too customized to do so, but I don't fully understand why

1) Some built-in mechanism isn't in place

2) This isn't being talked about more–I could find no discussion of unsafe passing of HTML/Script tags in JSON rendering from Rails or Sinatra apps on the web (I could only find a tiny amount of information about Sanitizing the params hash, so sanitizing values on the way IN, which arguably is better, but may not work for everything, as it is a one-size-fits-all solution; you may want to retain some HTML tags but only strip out <script> tags, for example).

3) Why, in the Ruby world, at least, there is only at present one library which exists for sanitizing (the Sanitize Gem, and it really only works on String datatypes–you have to write your own recursive code to sanitize a Hash, like the params hash, and there seems to be nothing written on this, either!). Rails does have a built-in sanitizer, but it is considered inferior to this third party Sanitize Gem, and is not as flexible when it comes to having a few levels of strictness in how deeply to sanitize (a string).

Am I misunderstanding the validity of injection into JSON as a vulnerability, or has this vulnerability largely been overlooked because JSON is not a core feature of all web applications?

End result: I used a before-filter in the main application controller on the back end to sanitize the params hash every time it comes in from a request, using the Sanitize library.

However, I believe this significantly slowed down the application because it has to happen on every request, and the Sanitizer is essentially running a series of regex calls in a recursive fashion on the hash. This way no tags ever enter the database, and can also never make it out via JSON, but it is a costly hit performance-wise. Is there a better way?

Mark Buffalo · Accepted Answer · 2015-12-19T23:59:11.797

This is often brought up everywhere, partially because everyone keeps repeating the mantra that input sanitation is the answer. It isn't. It's dangerous, bug-prone, and it needs to go away. Of course, you should always check to see if your input lengths are correctly corresponding to their appropriate columns.

Sanitizing Input vs. Sanitizing Output.

Sanitizing input gives people a false sense of security since there are so many ways to get around it, and because it's difficult to properly implement, forcing developers to search google for an outdated implementation that may or may not be secure. For this reason, it's best to sanitize output.

Part of the reason this is done later is because you want to preserve the correct data, but also to protect against SQL injection attacks. SQL injection attacks are largely defeated by prepared statements, not really input sanitation. You should replace all script tags and dangerous outputs with html entities.

Here's an example of replacing potential script characters with html entities on output, not input:

< becomes <, still displays as < on the page, but without messing with the layout, or the database.
> becomes >, still displays as > on the page, but without messing with the layout, or the database.

So Why Not Sanitize Input? What If We Correctly Implement It?

Most developers aren't IT security experts. Most developers wouldn't have a clue what to do in this area. By teaching them these two common methods of protecting data, you save development time and significantly increase the overall security of your web application. Better yet, you help your developers understand why this is necessary, instead of why 2340939403424 different types of input sanitation are needed, and prevent a lot of implementation issues that will invariably pop up later on.

Like I said before, searching google for outdated input sanitation functions is not security. That's a false sense of security. You need to understand what is acceptable, and what is not, and the process that data goes through.

With output sanitation, you don't have to worry about some weird bug happening later down the road that you forgot about. You don't have to mess with tons of different complicated functions that could be implemented incorrectly, and give a false sense of security. You don't have to worry about scripts being injected either.

But what if we replace all html entities before they're inserted? If you try that on the client side, anyone could modify the request. If you check it on the back-end before putting them in the database, that could work too... but then there's a problem:

What if there is a field that requires an actual HTML entity to be placed on it? Let's say a packaging, an address, or something silly like that. Maybe it's a file name, maybe it's something you have implemented in one way.

Example of Why Input Sanitation Sucks

Maybe someone has a funny keyboard that uses a different apostrophe for their name or address? This could leave some databases open to unicode-based smuggling.

Maybe you need a record that has < or > inside of it? What are you going to do? Search by the Html Entity? That's pretty inefficient, and requires a lot of hacks to get working right in many databases.

Maybe someone's name has an apostrophe in it - for example, Rory O'Cune. With input sanitation, you are destroying his name and requiring more code to deal with it. What if one of your employees is searching by last names and can't find him because it's been shortened to OCune? This is awful.

That's another reason why you use parameterized queries, and not input sanitation. With prepared statements and output sanitation, you can do this:

SELECT * FROM [table] WHERE [last_name] = @Lastname -- (or ?)

The @Lastname (would be a ? in Java) parameter/bound variable will be correctly translated to O'Cune. No funny business required. No bugs to hunt down. It's infinitely more secure, and you can just output the html entity if it's a page-breaking character.

So Why Isn't Ruby Fixing This Automatically?

Why should they? I've yet to come across any implementations of JSON in any language that will automatically remove these script tags. What if you want to use JSON to display script tags for some weird reason on your website?

By removing this feature, you are preventing anyone from outputting HTML through JSON, which I suspect could break a lot of things for a lot of people.

Thus, the answer is to do it yourself. Replace those script tags with Html Entities before they get serialized and returned.

Good points. I'm just surprised that there is so little discussion about replacing script tags in outputted JSON, when it is by design (of an app) not supposed to be there. — rcd, Dec 20 '15 at 20:41
@rcd Part of this reason is that most developers aren't aware of these attacks. — Mark Buffalo, Dec 20 '15 at 21:02

SilverlightFox · Answer 2 · 2015-12-19T21:32:47.817

Making inputs safe is better done as late as possible. That is, when data is output to the page. Globally "sanitizing" input data is bad because it is impossible to distinguish between good data and bad data at this stage (as you say, if you're allowing HTML to be entered by your users, the framework can't tell the difference between HTML that's meant to be there vs HTML that's not). Even with script in input sometimes this can be valid (think StackOverflow/StackExchange text boxes containing code).

As late as possible then, your application should be encoding to whatever format the output is in. For example, HTML or JSON. With the former and & sign becomes & so that it is properly rendered for display, for the latter a JSON encoder should be used which will convert & to \x26 inside a JSON string.

The JSON itself isn't an XSS risk as script does not execute in a browser from a JSON request (JSONP is another matter as these are included with script src references rather than loaded as data). The XSS risk with JSON is when JavaScript on the page attempts to create or populate HTML elements with the retrieved JSON data. The JavaScript itself should either HTML encode the data or use safe object members to populate the DOM (e.g. textContent).

Sanitization can be done as an extra layer, however proper encoding should be the focus. For example, you may wish to validate server side that a postcode or zipcode only contains alphanumeric characters and the space character. For more complex fields, this isn't an option without severely limiting your input capabilities.

Some frameworks attempt to globally sanitize data, such as .NET with the use of request validation. However, vulnerabilities are found all the time including this very recent one. In short it doesn't work and causes functional issues for languages that attempt it. It also offers no protection from XSS in data that is retrieved from other sources than the application itself.

Thank you for the additional information regarding the XSS risk regarding when JavaScript is rendered on the page. In this particular app, Rails is the back end, but Angular is the front end, so there is a lot of JavaScript going on. Still, I don't understand why you would want to allow Script tags to enter the database at all if, the application, by design, is never intended to output those tags (or any HTML tags for that matter). — rcd, Dec 19 '15 at 09:43
A script tag is just text, completely benign unless it is output incorrectly to html. Why shouldn't the framework allow it as input? Why should it discriminate just because the text has special meaning elsewhere? What if you need to store script tags (Stackoverflow scenario)? Besides, `` allows script to be executed without a script tag. Frameworks that "help" developers in this way are burying the problem. They should be encouraging developers to take output context into consideration rather than putting a wall against inputs. What if the script data is already there? — SilverlightFox, Dec 20 '15 at 10:49

XSS via JSON: Why does a web application not sanitize either its incoming params hash or its outgoing JSON values of malicious tags like Script?

2 Answers2

Sanitizing Input vs. Sanitizing Output.

So Why Not Sanitize Input? What If We Correctly Implement It?

Example of Why Input Sanitation Sucks

So Why Isn't Ruby Fixing This Automatically?

Linked