0

I know that a blacklist approach to URL filtering isn't the most secure, but let's say that in addition to this filtering we're also rewriting all untrusted links to go through a redirect page that warns the user about the risks, and that we're implementing the blacklisting carefully to avoid XSS Filter Evasion.

Which URL schemes would we then put in the blacklist? The two obvious ones are javascript: and data:, what others am I missing?

Changaco
  • 101
  • 5
  • Are you sure this helps with XSS at all? The point is that XSS can happen when you either include someone else's data in your page, or your page runs in someone else's context. So this is very little about links, and very much about things like loading ` – Marcus Müller Jan 15 '17 at 11:05
  • 1
    @MarcusMüller There are lots of legitimate use cases for limiting URI schemes to protect against XSS. User-generated external links, etc. – Arminius Jan 15 '17 at 11:35
  • 1
    Can you elaborate on why you want a blacklist instead of a whitelist? – Arminius Jan 15 '17 at 11:36
  • @Arminius Because a whitelist requires maintenance, for example we used to only allow HTTP(S) links and a user asked us for `xmpp:` support. Of course there aren't that many URL schemes and few new ones are created over time, but still it seems like maintaining a blacklist would be easier. – Changaco Jan 15 '17 at 11:51

1 Answers1

2

A blacklist of URI schemes is not reliable.

There is no common list of bad URI schemes because it's just not possible to build a complete one. There are lots of non-standard schemes for different browsers and more could be added in the future. (One example of a dangerous non-standard scheme would be livescript: for old Netscape versions.)

This is another good point:

I do not allow bitcoin: on my servers, nor anything except for http, https, or ftp.

The reason is because I have no way of knowing what third party applications might have vulnerabilities (including social engineering) that could be exploited by a specially crafted URI string being fed to them.

(Source)

Similarly, there is no blacklist of dangerous HTML tags because - although standardized - browsers still go their own ways with custom tags. Building a comprehensive list would be hopeless.

Another problem is that you could run into problems with nested schemes. What about view-source:data:..., rss:jar:..., etc.?

So it's safer to agree on a whitelist of acceptable schemes and add more on demand. E.g., this is the default on Wordpress:

$protocols (array) (optional) An array of acceptable protocols. Defaults to 'http', 'https', 'ftp', 'ftps', 'mailto', 'news', 'irc', 'gopher', 'nntp', 'feed', 'telnet', 'mms', 'rtsp', 'svn', 'tel', 'fax', 'xmpp' if not set.

(Source)

Arminius
  • 43,922
  • 13
  • 140
  • 136
  • Even assuming that building a complete blacklist is impossible (I'm not convinced that it is), we can still compile one that is good enough. – Changaco Jan 16 '17 at 10:01