Why is referer checking needed for Django to prevent CSRF

Question

Today I learned that Django's CSRF protection uses refer(r)er header checking in addition to checking a hidden form field against a cookie. It seems to be important, judging from docs and issue below.

It only checks this over HTTPS though. I've also noticed that almost no other website checks referer [since I turned off sending of said header and most forms still work].

So I have two questions:

How would the attack work that would be possible without this check? Doesn't https protect against man-in-the-middle attacks?
How do other websites protect against it? And does Django not project for http?

The info that I found:

https://docs.djangoproject.com/en/1.8/ref/csrf/#how-it-works

In addition, for HTTPS requests, strict referer checking is done by CsrfViewMiddleware. This is necessary to address a Man-In-The-Middle attack that is possible under HTTPS when using a session independent nonce, due to the fact that HTTP ‘Set-Cookie’ headers are (unfortunately) accepted by clients that are talking to a site under HTTPS. (Referer checking is not done for HTTP requests because the presence of the Referer header is not reliable enough under HTTP.)

https://code.djangoproject.com/ticket/16870

Unfortunately, this check is absolutely necessary for the security of Django's CSRF protection. Without it, we can't prevent man-in-the-middle attacks on SSL sites. We made the decision that preventing MITM was a more valuable tradeoff than breaking sites for the small minority of users who block the header in a fashion which does not improve privacy.

(It occurs to me that I posted this just before going offline for the weekend. Sorry about that.) — Mark, Aug 06 '15 at 19:56
Basically it protects against this vulnerability in double submit cookies CSRF mechanism: http://security.stackexchange.com/a/59512/8340 — SilverlightFox, Aug 10 '15 at 10:17

score 14 · Accepted Answer · answered Aug 06 '15 at 23:55

First of all, thanks for the interesting question. I did not know about the details of CSRF before and had to look up the answer to your question myself, but I think I know the correct explanation for Django's behavior now.

The Django developers are treating HTTP and HTTPS refers differently because users expect different things from insecure and secure web services. More specifically, if a web page is using transport layer security, users expect to be protected against man-in-the-middle attacks, meaning they trust in the principle that even if someone sat directly between them and the remote server and intercepted every single message, they couldn't make any use of that information. Note that this is not expected of plain HTTP connections.

Now consider the following scenario, quoted from a Django dev's post here :

user browses to http://example.com/

a MITM modifies the page that is returned, so that is has a POST form which targets https://example.com/detonate-bomb/ . The MITM has to include a CSRF token, but that's not a problem because he can invent one and send a CSRF cookie to match.

the POST form is submitted by javascript from the user's browser and so includes the CSRF cookie, a matching CSRF token and the user's session cookie, and so will be accepted.

I did not instantly understand this attack myself, so I'm gonna try to explain the details. Note first that we are looking at a page that displays forms over plain connections but submits data via SSL/TLS. Part of the problem, as I understand it, is that the cookie and hidden form value (aka "the CSRF token") are only compared against each other, not against any value that is stored server-side. This makes it easy for the attacker to supply their victim with a cookie-token-combination that will be accepted by the server - remember, the page displaying the form is not secured, so Set-Cookie headers and the contents of the form itself can be spoofed. Once the manipulated form is submitted (via injected JS, for example), the server sees a perfectly valid request.

Adding strict Referer checking is the answer to this exact problem. Checking these headers, only requests originating from https://example.com will be accepted at another endpoint of https://example.com. Insecure pages from the same domain will be treated as completely untrusted, and rightly so.

Now to come back to the question why plain HTTP requests are treated differently, we just have to imagine a site that doesn't use encryption at all. In that case, a man in the middle could also spoof the Referer headers sent with the actual form data, so checking those does not provide any additional security. In other words: There is no protection against CSRF attacks by a man in the middle - but, as I mentioned earlier, users do not expect this kind of security from plain HTTP sites.

Regarding your question about how other web frameworks handle this attack vector, I honestly have to say I don't know.

Essentially correct - as a cookie set on HTTP by the domain can also be read by HTTPS on the same domain (and the server cannot tell which is which), the referer check affords extra protection. Another mitigation would be [HSTS](https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security). See [here for my answer to another question](http://security.stackexchange.com/a/44976/8340) which describes the situation that this is protecting exactly (in relation to XSS, however the same applies to CSRF with double submit cookies as here). — SilverlightFox, Aug 10 '15 at 10:13
Thanks for the comment. Your first sentence expresses the core problem way better than the relevant sentence in the Django docs, in my opinion. — zinfandel, Aug 10 '15 at 11:56
Sorry, I'm still confused. If you have http pages that set session cookies, couldn't an attacker already steal them and do whatever on behalf of the user? What does CSRF protection accomplish at that point? — Mark, Aug 11 '15 at 13:13
No, the attacker doesn't steal the session cookie in this scenario, and they don't have to. The attack described above still works if session cookies are only set (by the server) and submitted (by the user) *confidentially*, i.e. via HTTPS. The fundamental security flaw in the specification of cookies is that a MITM can still *set* (and thus override) values from the outside (e.g. from an insecure page on example.com). In our scenario, they use this flaw to set an arbitrary CSRF token on the victim's machine. — zinfandel, Aug 12 '15 at 11:37
Thanks, I think I can see the situation where this is useful now. The site has http and https pages, but sends both session and csrf cookies only over https (so http pages are non-POST and session-independent). Then https is forced for some sensitive pages. If https is forced everywhere, this protection is redundant. Without https it does nothing. If cookies are not https-only, attackers would just steal sessions or csrf tokens instead. So apply for http+https with secure cookies. — Mark, Aug 13 '15 at 00:21
Yes, your summary sounds good to me. The only part I don't like is where you say "http pages are non-POST". In our situation, the POST itself has a secure *target*, but the page containing the form is insecure. So I'd argue that "http pages are POST-enabled", actually. I think you understood the basic principle though. — zinfandel, Aug 13 '15 at 12:32
I should add that the first request a user makes to the site could be http, there's nothing the website can do about that. The CSRF cookie could already be compromised at this point, even if all later traffic is https. — Mark, Jan 02 '17 at 20:24

Mark · Answer 2 · 2017-01-05T16:39:11.027

I'll briefly summarize what I've found since I asked this question.

A user's first request to the website cannot be guaranteed to be https (from the server side). An attacker might use this request to set a specific CSRF cookie. He can then use this to do a cross-site request on behalf of the user from a http domain outside your control. This is what the referrer checking prevents.

A solution that might come to mind is to reset the CSRF token when starting an authenticated session. The attacker could then only fake requests to anonymous services, which has little benefit (he could just do them directly by himself, it's anonymous after all).

The problem with that is that the login form itself is also vulnerable to CSRF, despite needing a password. The attacker, in that case, doesn't take over the user's session, but logs the user in into the attacker's session. The attacker might then be able to see things the user did or entered.

A few rare cases where I think it could be turned off, all assuming HSTS is on and you change the CSRF token on login:

You have a login process that does one of:
- Involve two pages, with the CSRF token changed after the first one, so that it can't be automated (also set X-Frame-Options). E.g. required two-factor authentication, or just a second page with a manual okay button.
- The login page has a CAPTCHA that protects against fake requests.
- The user won't be tricked into using the wrong account for our particular service (e.g. it's highly personalized in a way that's obvious to the user but the attacker can't detect).
You know the first request is going to be through https, e.g. the page is only supposed to be accessed through your app or software, rather than a normal browsers.
Browsers implement a way to see whether a cookie was https-only / was set on a https connection (then attackers couldn't set a usable cookie over http). But that's impossible at this time, and outside our control.

(Making CSRF tokens be session-dependent in a secret way, and be stored server-side, doesn't fix this. The attacker can't overwrite or generate the CSRF token, but he can just ask your server for it by opening a form using the session he chose).

Why is referer checking needed for Django to prevent CSRF

2 Answers2