21

I recently learned that SVG (Scalable Vector Graphics) images introduce a number of opportunities for subtle attacks on the web. (See paper below.) While SVG images may look like an image, the file format can actually contain Javascript, and it can trigger loading or execution of HTML, Flash, or other content. Therefore, the SVG format introduces new potential ways to try to sneak malicious content onto a web page, or to bypass HTML filters.

I'm writing a HTML filter to sanitize user-provided HTML. What do I need to do in my HTML filter to make sure that SVG images cannot be used to bypass my filter? What HTML tags and attributes do I need to block? Do I need to do anything when filtering CSS? If I want to simply block all SVG images, what are all the ways that SVG can be embedded into a HTML document?

References:

See also Exploits or other security risks with SVG upload? (a different, but related, question) and Mike Samuel's answer elsewhere.

D.W.
  • 98,420
  • 30
  • 267
  • 572

3 Answers3

19

I am one of the authors of this paper you linked. And I noticed, that some of the advice given in this thread is well-meant and well thought but not 100% correct.

For example, Opera is not providing reliable safety when dealing with SVGs embedded via <img> or CSS backgrounds. Here's an example for that, just for fun, we created a SVG embedded via <img> that would contain a PDF that would open a skype: URL that would then call you:

We created the SVGPurifier - a set of rules that extend the HTMLPurifier to be able to deal with cleaning SVG. Back when we wrote those rules (you can have them if you want - let me know and I'll put 'em on Github), every browser we tested treated SVG differently. Also strongly dependent on the way it was embedded: inline, with <embed>/<object>, <applet>, <img>, SVG in SVG, CSS background, list-style and content...

It turned out that it was possible, to find a harmless subset in SVG if you threat model mainly involved XSS and beyond. If your threat model nevertheless also includes for instance mitigation of UI overlaps, side-channels, history stealing attacks and what not it gets a bit harder. Here's for example a funny snippet showing, how we can cause XSS with very much obfuscated JavaScript URI handlers: http://jsbin.com/uxadon

Then we have inline SVG. In my personal opinion, this was one of the worst ideas W3C/WHATWG ever had. Allowing XML documents inside HTML5 documents, forcing them to comply with HTML5 parsing rules and what not... security nightmare. Here's one gripping example of inline SVG and contained JavaScript that shows, what you'd be dealing with: http://pastebin.com/rmbiqZgd

To not have this whole thing end up in a long lament on how terrible SVG might be in a security/XSS context, here's some advice. If you really and still want to / are working on this HTML filter, consider doing the following:

  • Give us a public some-test where we can hammer that thing.

  • Be flexible with your rule-set, expect new bypasses every day.

  • Make sure to know what the implications of filtering inline SVG are.

  • Try to see if the HTMLPurifier approach might be the best. White-list, don't black-list.

  • Avoid reg-ex at all costs. This is not a place for regular expressions to be used.

  • Make sure that your subset only allows those elements, that have been tested for security problems in all relevant browsers. Remember the SVG key-logger? http://html5sec.org/#132

  • Study the SVG-based attacks that were already published and be prepared to find more on a regular basis: http://html5sec.org/?svg

I like the idea of someone attempting to build a properly maintained and maybe even working HTML+SVG filter and I'd be more than happy to test it - as many others as well I assume. But be aware: HTML filtering is damn hard already - and SVG just adds a whole new layer of difficulty to it.

schroeder
  • 123,438
  • 55
  • 284
  • 319
x00mario
  • 191
  • 3
8

As far as I know the following ways can be used to refer to an svg.

  1. <img src="http://example.com/some-svg.svg">
  2. Any tag with css styles. e.g. style="background-image:url(http://example.com/some-svg.svg)
  3. Filtering on extensions is not enough. HTTP headers determine the content type, not the extension. A .jpg file may be read as an SVG. Therefore, any remote image is dangerous.
  4. You can inline any XML format, including SVG, in a web page.

Even if you check for all the items above, you cannot be sure that there is no SVG injection possible. You may want to go for white-listing instead of blacklisting.

Gilles 'SO- stop being evil'
  • 50,912
  • 13
  • 120
  • 179
jocewyn
  • 141
  • 1
  • 2
    Yup, blacklisting is problematic. They can be loaded via ``, `` or `
    `. Go for white-listing.
    – Polynomial Jan 01 '13 at 16:51
  • 1
    Thank you, very helpful! I agree 100% with the statement about whitelisting... but I still need to know what *not* to include on the whitelist. For instance, until I learned about SVG attacks, I had `` on the whitelist, and `
    ` too. So, I think it's still important to know all the ways that SVG can be included in a HTML document. Do you know what kind of filtering needs to be done with CSS? Is it just CSS constructs that can introduce images, or anything else?
    – D.W. Jan 01 '13 at 21:11
  • @D.W. - I'm not sure if it applies to SVG (or to what extent), but it's possible to send images (server -> browser, at least) as a solely binary representation (ie no file name/location, although I think you still had to indicate the type). This might take more work, but potentially you could create some sort of 'grammar' (like the tags used here or in phpBB forums). What are the bounds here? – Clockwork-Muse Jan 02 '13 at 23:29
-2

A simple approach is to not allow user-generated HTML onto your website. By using psuedo-code such as [b]bold[/b], you can filter out anything using tags and make sure that only your code can make HTML tags. There is still a lot of work to do to prevent HTML tags from being used if you need to be able to use the < and > symbols, but it is a simpler problem to address.

AJ Henderson
  • 41,816
  • 5
  • 63
  • 110
Frank E
  • 103
  • 1
  • 1
  • 3
  • 2
    This isn't a very useful answer. There's plenty of problems to be had with BBCode parsing and ensuring that HTML entities are properly removed. – Jeff Ferland Jan 08 '13 at 15:25