How do I securely embed an tag using a user-given URL?

Question

If I hypothetically wanted to allow users to use an avatar from an arbitrary URL like this:

<img class="avatar" src="{user input}" />

There are many problems that I can think of:

There are data uris so potentially it's possible to create a self-contained PDF file; would this actually work inside a <img /> tag?
There's javascript: notation that works for <a /> elements
A server may respond with 401 unauthorized and request basic auth, user would be tempted to try his own user/password combination
A server may return a humungous image

And probably much more.

Is there any way to make it secure without actually downloading and serving that image?

For information there is already an interesting discussion [about these threats](https://security.stackexchange.com/questions/36447/img-tag-vulnerability), but it covered only the threat aspect and not (as far as I remember) the mitigation aspect requested here. — WhiteWinterWolf, May 11 '15 at 14:10
You have to download it. You can keep yourself safe but your users' identity can be enumerated, their browser can be attacked. 401 shouldn't happen on an img tag, though. — Steve Dodier-Lazaro, May 11 '15 at 14:45

score 6 · Accepted Answer · edited May 23 '17 at 12:40

Avoiding attacks on your site

You need to validate that the string you received is valid. Remember this principle: you must white-list acceptable strings rather than black-list unacceptable ones.

Ensure that the string is a syntactically-correct and escaped URL. Escaping the whole URL avoids it containing " or > which could break your site's syntax.
Ensure the URL is absolute, to help avoid directory traversal attacks. You'll also need to figure out how your web server handles symlinks accessible from within your /var/www folder, and how your server resolves absolute URLs with references to parent directories e.g. www.server.com/../../../etc/passwd.
Ensure the URL uses a known, supported protocol of which you understand the implications: http or https. ¹

Protecting users' privacy

There's a priori no point in checking what the file being pointed to actually is since it can be changed any time by the user who provided the link without you knowing. If that code triggers a bug in the browser's rendering routine, then the (sandboxed) browser process that renders your site will be compromised.

You should be aware that images can be used to collect the IP addresses of your own users. Attackers just need to wait and collect which IP addresses query the image they used for your site.

The 401 attack should be fixed by now, but some browsers might still not handle it properly. It's been discussed on this StackExchange thread. As per this Chrome bug tracker thread, Chrome no longer displays an auth prompt when an image triggers a 401 error.

There's a common solution to these two threats: download and cache the image. If you can afford to do that, you'll have a higher certainty of users being safe. I know that's not what you're asking for but it's your best course of action.

Avoiding attacks on your server!

Since you're now caching the image, you need to white-list it once again.

Make sure the image corresponds to one of your supported file formats (e.g. PNG/SVG/JPG/GIF)
Make sure the image is syntactically correct
Reconstitute the images by extracting whatever information is relevant to displaying them and by dumping whatever information you don't recognise or need -- for instance SVG files can allegedly embed JavaScript, which you might not want as it would be executed within your domain by the browser. ²
Impose reasonable limits on the resolution and file size of the image

¹ It's been pointed out that an attacker could do something such as file:///some/user/secret/on/their/local/machine. If you did not retrieve the file but just left the URL as is, this could cause the file to be loaded by the rendering process that renders your site. In conjunction with a second URL to an image that successfully exploits a bug in the render, an attacker could retrieve the content of this file.

² Note that SVG files are particularly dangerous, and that there are no available tools to filter them currently.

I know it's not directly related to the question, but are there any legal implications for caching copyrighted images/data? — domen, May 11 '15 at 15:44
@domen It depends. Which country are we speaking about, what is the relationship between the image providers and the server, who owns the images? There are plenty of applicable laws. In France (only law I know), laws related to the protection of authors' rights, intellectual property, unauthorised access to computer systems, responsibilities of content hosters (a specific, regulated status in France) in avoiding IP violations all apply. Best to ask a specialised lawyer. — Steve Dodier-Lazaro, May 11 '15 at 15:47

How do I securely embed an tag using a user-given URL?

1 Answers1

Avoiding attacks on your site

Protecting users' privacy

Avoiding attacks on your server!

Linked