12

Is there a proven library with functions for preventing XSS attacks? Many people don't realise that htmlspecialchars is not enough to prevent XSS attacks. There are various contexts that need their own escaping (html properties, Javascript, more?). Is there a proven library that will provide functions for me to escape in all of these contexts?

Casebash
  • 601
  • 1
  • 7
  • 16
  • How is `htmlspecialchars` not enough? Can you show an example? – Billy ONeal Nov 10 '11 at 02:26
  • 2
    @Billy http://blog.kotowicz.net/2011/10/piwik-151-multiple-xss-vulnerabilities.html htmlspecialchars is not context-aware,it's only good to escape in simplest "html element content" context – Krzysztof Kotowicz Nov 10 '11 at 06:04
  • Yes, many people write bad code. And the reason why there are multiple different functions within PHP for changing the representation of character data is because there are many places PHP can put that data - into SQL statements, XML documents, CSV files, HTML, URls, Javascript.... TNEF comes close as a magic bullet - but at the cost of making your data totally unintelligible to inspection/processing without changing back to a different representation. – symcbean Nov 10 '11 at 10:11
  • @Krzysztof Kotowicz: I don't see anything about htmlspecialchars in that blog post. Obviously if you're embedding a string in a script or something then you'll have to use an appropriate escaping function. But I can't think of an input to htmlspecialchars that would result in broken behavior. Of course if there's embedded javascript in a link you're subject to that. But I can't really think of a case where that results in a security problem... – Billy ONeal Nov 10 '11 at 16:42
  • 1
    @BillyONeal .. and that is the whole point. You asked for example where htmlspecialchars is not enough (for XSS protection). It can't protect you in all contexts just like you noticed, and that is why Casebash asks for additional functions (because PHP leaves you with only incomplete solution like htmlspecialchars). – Krzysztof Kotowicz Nov 10 '11 at 17:01
  • @KrzysztofKotowicz: How do you anticipate the user creating a link when there's no way for them to create an tag with the associated javascript to run? – Billy ONeal Nov 10 '11 at 17:24
  • @BillyONeal see http://stackoverflow.com/questions/110575/do-htmlspecialchars-and-mysql-real-escape-string-keep-my-php-code-safe-from-inje for an example. Then read this to better understand the issue https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet – Cheekysoft Nov 11 '11 at 14:13
  • @Cheekysoft: Ah. Well in that case htmlspecialchars never claimed to provide that kind of protection. It's designed for cases where you've got a user-entered text field and you want to display that as text on a page. htmlspecialchars does prevent XSS attacks -- but only for the specific use cases it says on the tin. Saying that it's unsafe to use though is a bit broad. – Billy ONeal Nov 11 '11 at 16:43
  • @BillyONeal Unfortunately, even in a plain open HTML content context --e.g. `
    = htmlspecialchars($someVar) ?>
    `-- there are attacks that can execute XSS through htmlspecialchars() in a number of common situations, especially those that utilise multi-byte charset attack vectors, unless your page is using a more limited charset such as ISO-8859-1.
    – Cheekysoft Nov 11 '11 at 16:54
  • @Cheekysoft: I really don't believe that. You're the second person to reference "multibyte charset attack vectors" but no reputable (e.g. owasp) source discusses that in the slightest, and nobody has been able to produce an example... – Billy ONeal Nov 12 '11 at 18:31
  • @billyoneal try google,maybe? I can assure you multibyte charset attacks against any non multibyte aware escaping functions work in practice. I get the feeling the comments diverged from the main question too far. Question author and a few others understand why htmlspecialchars is not enough, but you still keep on questioning that. Create a separate question, this is out of this topic in my opinion. – Krzysztof Kotowicz Nov 13 '11 at 14:51
  • 1
    @KrzysztofKotowicz: `htmlspecialchars` *is* multibyte aware. The reason I'm making a big deal over this is that whenever you make a system more complicated, you're much more likely to induce security related bugs. The OP is saying, "I don't want to use htmlspecialchars because I have heard it is insecure" -- yet, nobody has been able to show an attack that breaks it unless you use it incorrectly -- inside an HTML entity. I'm not motivated to replace a simple and easy to verify escaping function with a monster like HtmlPurifier or similar unless someone can show an attack against it. – Billy ONeal Nov 13 '11 at 17:10

4 Answers4

9

Yes, there are several such libraries. The choice of library depends upon what language/web programming framework you are using.

First, you need to start by reading about XSS. I recommend the following document:

Next, once you have familiarized yourself with that, I can point you to a few libraries that provide well-tested escaping functions:

However, let me warn you that using these functions properly requires developer knowledge, and is potentially error-prone. There are many different parse contexts where dynamic data might be injected into HTML, CSS, or Javascript content; each parse context may require a different escaping function (or sequence of escaping functions) be applied. The developer needs to make sure to apply the correct escaping function(s) each time he/she injects dynamic data into such content. This requires knowledge of the XSS risks and how to defend against them.

This approach (manually escaping data everywhere it is used) is also error-prone. It is easy to forget to apply an escaping function. If you have 100 places where you include dynamic data in the document, it would be easy to remember in 99 places and inadvertently forget to escape in one of them. We're only human, and those kinds of mistakes are easy to make. Libraries of escaping functions don't help developers avoid this problem.

A better solution is to use a web programming framework that helps developers avoid this problem. The state-of-the-art is context-sensitive auto-sanitization. This works particularly well with frameworks that provide a HTML templating system. In such a system, the template engine is responsible for automatically escaping all non-static data that is interpolated into the template. Because the template engine can identify the parse context where the data is being dynamically inserted, the template engine can automatically select and apply the proper escaping function.

The primary issue with context-sensitive auto-sanitization is that only a few web programming frameworks currently support it. Some web frameworks that do support context-sensitive auto-sanitization: Google Ctemplate (Closure templates), GWT, Google Clearsilver.

For more on this topic, I can highly recommend the following research paper:

D.W.
  • 98,420
  • 30
  • 267
  • 572
  • Thanks for your answer. This question is asking specifically about PHP and unfortunately ESAPI isn't production ready yet. The project I am working on unfortunately utilises a custom templating system and it is infeasible to switch everything over to a newer and better system. That research paper looks useful – Casebash Nov 12 '11 at 00:24
4

Instead of using context-aware escaping (probably there are various PHP libraries that try to go this way) I'd recommend using full HTML XSS-aware parser that will create a document tree and will allow only chosen whitelisted elements, attributes etc.

Projects that go with this approach are HTMLPurifier and Wibble. For example, Wibble:

  • Converts all HTML input to HTML safe UTF-8
  • Loads the HTML into a DOMDocument object
  • Applies one or more filters (DOM manipulators) to the HTML DOM
  • Extracts the filtered HTML from DOM and applies HTML Tidy (ext/tidy)
  • Converts the final HTML to the user's selected character encoding (if not UTF-8)

There is no place for a malicious code coming from almost-valid HTML syntax to pass through this. If there's anything wrong with the HTML, it would most likely be your incomplete whitelist (or some weird html/tidy attack vector, but I doubt it).

Update: If you don't need full-blown HTML parser, another option is to use secure-by-default templating engine. The only PHP project with context-aware escaping I know is Nette Latte templating engine, I didn't use it though. Other engines also have different escaping techniques, but you need to specify context by yourself (and if you omit the context once, you're open to XSS attacks). Examples are Twig or Smarty. You can just take the escaping code out of those to use it outside templating engine.

Krzysztof Kotowicz
  • 4,068
  • 20
  • 30
  • 3
    "HTML Purifier 4.3.0 is a **major security release addressing various security vulnerabilities** related to user-submitted code and legitimate client-side scripts." When did `htmlspecialchars` had a security release? – curiousguy Nov 09 '11 at 23:25
  • 1
    There are situations where HTML parsers are useful - for example, these are the only reliable solution when allowing the user to enter their own HTML, but I don't want to use this everywhere I interpolate a variable. It is just too heavy-weight. – Casebash Nov 10 '11 at 06:07
  • 1
    Not in a few last years I suppose,because it's just a simple str_replace equivalent function that does not offer xss protection in all cases,because it's not context-aware. – Krzysztof Kotowicz Nov 10 '11 at 06:08
  • +1 to Krzysztof. @curiousguy You do understand that `htmlspecialchars` doesn't protect against XSS in many situations, right? You should definitely read this https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet This is what everyone is talking about when they say "context" – Cheekysoft Nov 11 '11 at 14:09
  • 1
    The problem with using HTMLPurifier or Wibble is that they will block all use of Javascript and other stuff, not only from untrusted users, but also from the trusted owner of the website -- these libraries make no distinction between the two. Many website authors legitimately need Javascript and CSS on their web page. Thus, they can't simply apply HTMLPurifier/Wibble to every dynamically generated page. HTMLPurifier is more useful when you want to allow untrusted users to provide HTML markup: e.g., for a Wiki, or to let the user leave a comment, etc. – D.W. Nov 13 '11 at 03:19
  • @Cheekysoft 1) When comment on other's comments, you may want to use the Stack Exchange @syntax: with JavaScript on, you get auto-completion. If you don't have auto-completion, you are probably not really using SE @syntax. 2) I am not a specialist of `htmlspecialchars`, but from its name I can tell it's for HTML (stuff with `` I guess). For a specific parsing context you obviously need a specific encoding function. – curiousguy Nov 13 '11 at 06:15
  • (continued) 3) How does this make my previous comment about the fact that HTML Purifier has an history of vulnerabilities **in any way** less relevant? For a proof that HTML Purifier tries to solve a baldy specified problem, thus too difficult to solve, see [HTML Purifier: CSS Quoting Full Disclosure](http://htmlpurifier.org/security/2010/css-quoting) "_There are several quirks in Internet Explorer's parsing of string-like expressions in CSS that caused this security vulnerability._" – curiousguy Nov 13 '11 at 06:19
  • (continued) How many other browsers have such parsing quirks? How do you know, except by systematic browser testing? From a security point of view, I know I don't want to go there. HTML (SGML actually) syntax is quite ugly. With various other relevant syntaxes (URL, CSS...) it only gets worse. I am 100% confident that `htmlspecialchars` will properly escape <>&" (so user-provided text with `` inside will be safe). I don't think I would ever be able to convince myself that HTML Purifier is bug-free. – curiousguy Nov 13 '11 at 06:25
  • 1
    @D.W. HTML Purifier is designed to be used only on user-provided fragments, not on the whole HTML page. – curiousguy Nov 13 '11 at 06:30
  • @curiousguy: exactly, that's part of why it is not really the right solution to Casebash's problem. HTMLPurifier is designed for a very specific use case -- which is different from the one that Casebash faces. – D.W. Nov 13 '11 at 07:54
3

There are two things to keep in mind:

  1. Always escape on output, not on input.
  2. You can cache your escaped output if you're worried about performance.

HTML Purifier is good, but it's blind to the context it's used in. The following code IS vulnerable:

<input type="text" name="test" value="<?php echo $hp->purify($_GET['foo']); ?>" />

Proof of Concept: /?foo=test%22%20onload%3D%22alert(%27XSS%27)%3B

Context-aware escaping is usually something you only acquire through your templating engine. The most general case:

If you're concerned about the performance of running your user-provided data through HTMLPurifier on each request, give the Stauros XSS library a whirl. (The readme says it's still experimental, but I've yet to find a way to bypass it.)

Scott Arciszewski
  • 835
  • 11
  • 28
3

The best solution against XSS is having programmers understand how data can be dangerous, and use context-aware escaping when they output data. Always. Makes the programmers think more about what they're doing, and gives you more robust programs. I see no problem with that approach :-)

chris
  • 3,000
  • 14
  • 22
  • But you may as create functions for each context. But, it would seem like someone else would have already created such functions and reinventing the wheel would be kind of pointless – Casebash Nov 10 '11 at 12:02
  • I'm not saying programmers should create their own functions. On the contrary, they should use those available. But to understand where to use which function, you need to think about where data comes from and where it is going. – chris Nov 10 '11 at 15:25
  • @Casebash I would advocate the reinvent-the-wheel-so-that-you-know-how-to-do-so approach. (And then you can just use the standard component.) – curiousguy Nov 13 '11 at 06:33