Avoiding attacks on your site
You need to validate that the string you received is valid. Remember this principle: you must white-list acceptable strings rather than black-list unacceptable ones.
- Ensure that the string is a syntactically-correct and escaped URL. Escaping the whole URL avoids it containing " or > which could break your site's syntax.
- Ensure the URL is absolute, to help avoid directory traversal attacks. You'll also need to figure out how your web server handles symlinks accessible from within your
/var/www
folder, and how your server resolves absolute URLs with references to parent directories e.g. www.server.com/../../../etc/passwd
.
- Ensure the URL uses a known, supported protocol of which you understand the implications:
http
or https
. ¹
Protecting users' privacy
There's a priori no point in checking what the file being pointed to actually is since it can be changed any time by the user who provided the link without you knowing. If that code triggers a bug in the browser's rendering routine, then the (sandboxed) browser process that renders your site will be compromised.
You should be aware that images can be used to collect the IP addresses of your own users. Attackers just need to wait and collect which IP addresses query the image they used for your site.
The 401 attack should be fixed by now, but some browsers might still not handle it properly. It's been discussed on this StackExchange thread. As per this Chrome bug tracker thread, Chrome no longer displays an auth prompt when an image triggers a 401 error.
There's a common solution to these two threats: download and cache the image. If you can afford to do that, you'll have a higher certainty of users being safe. I know that's not what you're asking for but it's your best course of action.
Avoiding attacks on your server!
Since you're now caching the image, you need to white-list it once again.
- Make sure the image corresponds to one of your supported file formats (e.g. PNG/SVG/JPG/GIF)
- Make sure the image is syntactically correct
- Reconstitute the images by extracting whatever information is relevant to displaying them and by dumping whatever information you don't recognise or need -- for instance SVG files can allegedly embed JavaScript, which you might not want as it would be executed within your domain by the browser. ²
- Impose reasonable limits on the resolution and file size of the image
¹ It's been pointed out that an attacker could do something such as file:///some/user/secret/on/their/local/machine
. If you did not retrieve the file but just left the URL as is, this could cause the file to be loaded by the rendering process that renders your site. In conjunction with a second URL to an image that successfully exploits a bug in the render, an attacker could retrieve the content of this file.
² Note that SVG files are particularly dangerous, and that there are no available tools to filter them currently.