23

Suppose we have a site that has public and private areas. The private areas require login.

For example "www.site.com/about" is publicly accessible. But "www.site.com/message_inbox" requires authorization (valid login).

So what happens when someone who is not logged in, tries to access a private area like "www.site.com/message_inbox"?

It would be terribly confusing for legitimate users to receive a 404 error. (e.g. imagine refreshing the page after your session expires and seeing a 404). Therefore, it is convenient for legitimate users if we redirect to a login page.

However, then an attacker could determine whether "www.site.com/some_page" is a legitimate private URL, by seeing if it returns a 404 error or a login page. Maybe we don't want outsiders to be able to compile a list of valid URLs.

We could attempt to mask this by redirecting ALL requests to the login page, except for the public pages. But this becomes silly as all junk requests will happily return HTML.

What is the correct solution to this?

CaptainCodeman
  • 291
  • 1
  • 6
  • 1
    "Maybe we don't want outsiders to be able to compile a list of valid URLs." Why? If the pages are open to attack simply by knowing their URLs, trusting their URLs to stay secret is nothing but security through obscurity, which as we know is no security at all. If they're secured by a login, what is the gain in hiding them? I see a lot of UX benefits in redirecting to a login page and no security benefits in redirecting to a 404. – Kryomaani May 18 '22 at 02:13
  • 5
    @Kryomaani it could be a defense against corporate espionage. For a Software as a Service product where you're hosting projects on behalf of other organisations this allows enumeration of things like unreleased product names. Github [does this with their API](https://docs.github.com/en/rest/overview/other-authentication-methods#basic-authentication) "*In many places, this would disclose the existence of user data. Instead, the GitHub API responds with `404 Not Found.`*" – Wes Toleman May 18 '22 at 02:46
  • 8
    What's wrong with someone "compiling a list" of valid URLs? – Tvde1 May 18 '22 at 08:17
  • 1
    To give one concrete example of why you might want to keep certain URLs secret, if I'm an attacker with a list of credentials from another breach, I might want to try a credential stuffing attack. I don't want to waste time trying credentials for users that don't exist on your site. If your website has user profiles with the URL scheme `website.com/users/${username}`, and allows an unauthenticated user to check whether such a URL is valid, then as the attacker I could enumerate which of the users on my cred list exist on your website, making my cred stuffing attack much more effective. – ymbirtt May 18 '22 at 08:53
  • 5
    @ymbirtt then dont do that, use a non-PII bit of info for the URL like a profile id rather than the username for private profiles. – Moo May 18 '22 at 09:35
  • 6
    @ymbirtt Then require authentication to access the page regardless of whether the user exists or not. – user253751 May 18 '22 at 09:42
  • 2
    Basic example, perhaps John does not want everyone in the world to know that photogallery.com/users/john/gallery/secret_gay_party/ exists... For a photo-gallery site to reveal this would be a breach of security in my opinion. This is not about "security via obscurity". – CaptainCodeman May 18 '22 at 12:14
  • @Moo, sure, that'd be a good idea if you were designing the application from scratch, but if you shipped 5 years ago and your entire userbase is already reliant on `gallery.com/users/john/gallery/fun_public_appearance` being a shareable URL, you'll cause a pretty serious shock to your users by suddenly changing your URL structure. I'm not trying to say that every website looks like this, I'm just trying to respond to the comments suggesting that it's not a problem. We want John to be able to share his fun public appearance, but keep his personal life completely personal. – ymbirtt May 18 '22 at 13:02
  • 2
    "imagine refreshing the page after your session expires and seeing a 404" <--- why does your session expire and re-prompt for a password? This is training users to be phished. Users should never see a password prompt unless they intentionally login *on a new device*. But even if you want to make them do it anyway, you can always distinguish users with an expired session from users who never had a session (or cleared cookies, etc.) – R.. GitHub STOP HELPING ICE May 18 '22 at 15:53
  • @r-github are you suggesting that a login session should last forever? – CaptainCodeman May 18 '22 at 16:50
  • @ymbirtt then youve answered your own question - its a shareable URL, in other words its already public information and thus you have to live with an enumeration attack as a possibility, and defeat it in other ways. – Moo May 18 '22 at 18:11
  • I don't understand this comment "But this becomes silly as all junk requests will happily return HTML." Can you explain? – John Wu May 18 '22 at 20:40
  • 1
    @JohnWu for example, if you don't have robots.txt or favicon.ico and they are requested, it should return 404, not a login page – CaptainCodeman May 19 '22 at 06:15
  • 1
    This *was* listed as security vulnerability for some software I maintained a derivative of. Basically, it was project hosting, so things that require a login (like account settings) would just redirect, but the question was how to handle things not normally visible (like `/project/$projectname/`). The applied fix was apparently to handle them as 404 (with the option to make 404 redirect to the login page if not already logged in). • 「as all junk requests will happily return HTML」 as long as you still give a 404 HTTP status code, it’ll work out, search engines filter those out. – mirabilos May 19 '22 at 20:45
  • 1
    ```It would be terribly confusing for legitimate users to receive a 404 error``` - they're not supposed to get 404 anyway, they're supposed to get `HTTP 403 Forbidden` or `HTTP 401 Unauthorized` – user1067003 May 20 '22 at 14:51

9 Answers9

57

What is your threat model?

With a blanket approach you won't solve your use case. Correct, if you do as you describe you allow an attacker to enumerate your valid pages, theoretically. Does he have an advantage doing so? Do you have a possible attack vector that requires him to have knowledge of valid pages? Would your app leak information through such an enumeration?

These are the questions to ask. Once you have the answers, you can calculate the trade-off between user-friendliness and security.

Maybe we don't want outsiders to be able to compile a list of valid URLs.

The question "why?" is asked not often enough in InfoSec. We have a bunch of "best practices", most of which are really based on "everyone I asked thinks that's a good idea". Take the password complexity disaster where we've told users for decades something that's simply wrong. And it'll take us at least another decade to get all those silly complexity rules encoded into software and security policies out of the system.

Never stop with "maybe we don't want". Ask what the actual threat behind it is that you are trying to prevent.

Tom
  • 10,124
  • 18
  • 51
  • 2
    Spot on! I think the answer to "Maybe we don't want" is simply "Well then, maybe don't do". – MonkeyZeus May 18 '22 at 13:44
  • I think you are focusing too much on the word "maybe". In this case we are looking at an application where enumeration of the possible URLs is indeed a problem. (This is not always the case in other applications. Hence the word "maybe".) – CaptainCodeman May 18 '22 at 17:12
  • 1
    While threat models are a really good tool, my experience tells me that you should not assume that just because *you* can't imagine a way something could be abused it doesn't mean that no one else can. If asking 'why' becomes a way to dismiss valid concerns, it can be problematic. I've had valid security concerns dismissed by people simply because they lack an understanding of the risk and can't understand the explanations around them. – JimmyJames May 18 '22 at 17:30
  • 9
    @JimmyJames yes, you should always assume the attacker is smarter than you are. That is why you don't discard a threat because you can't imagine how to do the attack, but you do discard it if you see that even a successful attack doesn't lead anywhere. In this case: If the attacker doesn't gain any useful information by enumerating your site, it doesn't matter how clever he is in doing it. – Tom May 18 '22 at 19:02
  • 2
    "If the attacker doesn't gain any useful information by enumerating your site" but this assumes you are correct in that assessment. What if there's something useful about this information that you haven't considered. I would tend to go the other way, if there's no benefit to distinguishing between these errors then you shouldn't do it. – JimmyJames May 18 '22 at 19:17
  • @JimmyJames the assessment depends on your site and business case. In case of doubt, err on the side of caution. But it's an assessment that can be made. – Tom May 18 '22 at 21:16
  • I agree at a high-level. Get a bunch of your smartest people and brainstorm. Get a consultant. If the Log4J debacle taught us anything, it's surely that you can't let one (or just a few) people decide that something isn't a risk. It occurs to me that this post and these kinds of discussions are exactly what we need more of. – JimmyJames May 18 '22 at 21:20
  • @JimmyJames risk identification is one of the unsolved problems of InfoSec. I teach a couple methods in my risk management courses, such as SWIFT or more formal threat modeling as well as some classics (data flow diagrams, etc.) - but nobody has found the "magic bullet" yet, and I've been trying to find better ways for years myself. – Tom May 19 '22 at 07:36
  • 4
    @Tom People are routinely fooled by "proofs" of impossible statements like "1 = 0" because they don't naturally recognize that dividing by an unknown `x` only works when `x` is non-zero. If someone on the math S.E. asked "how can this be?", I would expect the answer to go beyond "what's wrong with the proof?" and instead help them identify those errors. Similarly, while "what's the threat model?" is a crucial question to ask, the purpose of questions like these is to get help in identifying the model. Restating the question isn't helpful to people who want help with the answer. – TheRubberDuck May 19 '22 at 13:38
  • 1
    @TheRubberDuck To give a specific answer requires more specific knowledge than the question gives. These are some of the questions I would ask if I were hired to help out with this problem. Of course that's only the beginning of the solution. – Tom May 19 '22 at 15:27
  • Assuming you will always make mistakes and never understand the threat is fine, as long as you accept the corollary that your preventative work against such non-understood threats is almost certainly incomplete/invalid/not valuable. Rather than lock things down at random, you could probably better spend your time on auditing, alerting, and backups in order to catch, respond to, and recover from the unknowns. – Matthew Read May 19 '22 at 23:42
  • Risk management (of which threat modeling is a part) is all about dealing with uncertainty. Several comments here got that right: You'll never know it all. Anyway we need to make decisions. – Tom May 20 '22 at 06:01
  • It's possible that "is this secure?" is a bad *kind* of question for a Q&A site. There's a [meta post](https://security.meta.stackexchange.com/questions/2180) on the subject. After all, your answer is correct and even insightful, but it doesn't necessarily help the querent walk away with a meaningful resolution to their question. Still, many people need to make baseline, this-or-that security decisions for their app or server without the luxury of expertise or targeted analysis and thus a best practice could help them move forward. – TheRubberDuck May 20 '22 at 13:18
  • @TheRubberDuck I agree that we could need more "don't look into laser with remaining eye" type practical advises. Unfortunately, cybersecurity constantly evolves, and is full of "best practices" that are totally pulled out of a hat with zero foundation beyond "author thinks this is a good idea". So at least for the moment, I don't think there's such a thing as a generally applicable best practice. – Tom May 20 '22 at 13:33
25

There is no correct solution as every site has there own things going on, but I'll give my two cents on how you can tackle this.

Usually sensitive pages are behind a directory or on a separate subdomain which allows you to mask all sensitive pages and others by simply returning a 301 redirect to the login page. So for example /members/home will redirect to /members/login, and so would /members/asadasd, so the attacker won't know the different sensitive pages. If you're able to move everything to this type of structure, it's probably preferable.

As for your case, the best solution is to probably return a 404 if the user is not logged in and is trying to access a sensitive location. This is so the attacker won't be able to enumerate a valid page (e.g., /message_inbox) and a non-valid page (e.g., /asdasasd) as both will return a 404.

As pointed out in the comments, this approach has been suggested in RFC-7231 (Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content):

An origin server that wishes to "hide" the current existence of a forbidden target resource MAY instead respond with a status code of 404 (Not Found).

Bubble Hacker
  • 3,615
  • 1
  • 11
  • 20
  • 25
    Returning a 404 for unauthorized requests to hide URLs from attackers does not sound like a user-friendly nor safe approach. – CodeCaster May 17 '22 at 23:26
  • @CodeCaster of course. But given how the question asks about not giving away a page existence and at the same time the OP asked not to redirect to a login or any page returning HTML, I found this is probably the most feasible solution. As for safety - what is the problem with the 404 errors? – Bubble Hacker May 18 '22 at 07:18
  • 6
    @CodeCaster GitHub gives 404 for private repositories if you don't have access, so an attacker can't enumerate "existence" of private repositories. – iBug May 18 '22 at 08:44
  • 1
    I agree with the subdomain, `/members/home`, etc... suggestion but vehemently disagree with the 404 suggestion. Within the company there are and will always be documents/instructions which to point to a URL directly. It is de facto to redirect someone to the login page when they need to log in; preferably after login they will be forwarded to their intended destination. Good luck convincing your user-base that this 404 nonsense is for their own good. 404 *is* a solution but I'd be hard-pressed to call it the best one. – MonkeyZeus May 18 '22 at 11:53
  • 7
    @iBug The private repository makes sense because it's hiding user-created content but OP's issue is akin to hiding the fact that https://github.com/new exists. – MonkeyZeus May 18 '22 at 11:54
  • 9
    @CodeCaster Not sure if it matters to you but this approach is specifically stated to be a valid way to use [404](https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.3): "An origin server that wishes to "hide" the current existence of a forbidden target resource MAY instead respond with a status code of 404 (Not Found)." – JimmyJames May 18 '22 at 14:50
  • 1
    @JimmyJames thanks for that! Added it into the answer as well. – Bubble Hacker May 18 '22 at 15:16
  • @BubbleHacker No problem. I've seen the opposite approach as well (e.g. within AWS) where 403 is always returned (i.e. instead of 404.) Personally, I think that's more confusing. – JimmyJames May 18 '22 at 15:21
  • 4
    IMHO, if URLs that include "identifiers" always used secure random ids there would be 0 issues since you cannot enumerate them in any practical way. The problem only arises when the URL uses simple integer ids or usernames that can be discovered "outside" so the attacker can make some good guesses. In that case IMHO your server should keep track of suspicious behaviour like an IP trying to access the same URL with many different ids in a short time frame and ban the IP to slow down the attack, though the best solution would be to simply use non-enumerable identifiers. – Bakuriu May 18 '22 at 20:08
  • @Bakuriu I agree and made a similar point on another answer here. Perhaps this is a reason to stop using meaningful URI constructs, at least in the case that you have these kinds of concerns. – JimmyJames May 18 '22 at 21:23
  • 2
    @JimmyJames `Personally, I think [403 is] more confusing` I disagree. I look at it like this example: if I ask someone "does this exist?" and they say "no", only to later be granted access and find out it does exist, then I've been 'lied' to. If I ask someone "does this exist?" and they say "are you allowed to know if it exists?" I can use credentials to say "yes, I am" (or just say "no" and slink away), and their response can remain 'truthful'. IMO, web responses should be 'truthful'. – Daevin May 19 '22 at 16:04
  • @Daevin The problem, at least in the case I dealt with is that there is no way to provide valid credentials for the non-existent resource. Credentials provided or not, you always get 403. E.g.: the path for `foo/barr` gives 403, `foo/bar` returns 200. No amount of effort around fixing access will ever resolve the 403, only correcting the path does. – JimmyJames May 19 '22 at 16:11
  • @JimmyJames sorry, I'm not sure I follow that. Why would `foo/barr` always return 403? **1)** server checks if it's a public resources: yes yields 200, no yields server to authenticate. **2)** server determines if the request is authenticated: no yields 403/302 and the end of the current path, yes yields step 3. **3)** server checks for URI existence: no yields 404, yes yields 200. – Daevin May 19 '22 at 16:38
  • @Daevin I can't speak to the motivations of the AWS team managing these servers but I assume it's related to the same goal as using 404 in lieu of 403. For the resources in question, they never return 404 AFAICT. An incorrect path always returns 403. Actually, no authentication was required for any valid path in that context which made it even more confusing. – JimmyJames May 19 '22 at 16:54
20

I don't think this is a serious flaw (see Tom's answer).

However, if you think it is, the problem can be avoided.

You have a list of "publicly available URLs", such as /about.

For all other URLs, you should give a 302 to a login page whether the requested page exists or not. Only after the user logs in should you give a 404 if relevant.

This way, the redirect does not give intruders any information at all.

Stig Hemmer
  • 2,403
  • 10
  • 14
8

The correct solution is to issue the redirect regardless of the status of the target URL if the user is not authenticated. This is easily doable for any normal web server (you set up a redirect rule to match on the common prefix for all the sensitive pages that also checks for the existence of the session credentials), provides good UX for legitimate users, and avoids the issue of potentially disclosing the existence of specific URLs.

Note that when I say ‘redirect’ here I mean sending a 302 status code (not a 301 like some of the other answers suggest, a permanent redirect is not correct here) with a Location header pointing at the login page, and ideally set things up to return the user to the desired page after login. This method avoids sending the login page if the client doesn’t actually follow the redirect, and also allows the login page to be cached (unlike doing silly things like URL rewrites or having the web app throw up different HTML depending on the authentication status), which should mitigate any usage issues from people trying to do URL harvesting.

If you really do not want to redirect to a login page, then you should return a 403 status code for all unauthenticated requests instead (and possibly use a custom error page with a link to the login page). This is the HTTP equivalent of a ‘Permission denied’ message, so unlike a 404 it accurately describes the actual error.

The important thing here is that regardless of which status code you choose, you return it uniformly for all secure URLs when an unauthenticated user attempts to access them. By making the response uniform, you avoid the risk of information disclosure, and it just comes down to how you want to respond.


What I describe above is the standard approach in most modern web apps when the default assumption is that the resource the user is asking for actually exists. If, instead, the default assumption is that the resource does not exist (this is the case for example with GitHub’s handling of private repositories), then the more correct behavior is to just return a 404 for all private URLs for unauthenticated users.

Austin Hemmelgarn
  • 1,625
  • 7
  • 9
  • 2
    Regarding *uniform* returning of unauthorized queries: it might be also needed to remember the timing component here, especially with access control. It might take 50ms to check if page exists and return that as 302 redirect, but checking existence *and* access permission may take 100ms if you do it against database on two queries. Even when the attacker got 302 redirect again, they can see the page actually exists from the timing difference. It really depends on your threat model if that is acceptable. You can also go the "no login cookie exists, return 302" route in some cases. – hegez May 20 '22 at 02:35
3

I agree with Tom's answer that this seems like a bizarre threat model.

Worrying about attackers enumerating static URLs implies that:

  • The web app is on the internet (or at least accessible to attackers, ie not on a private network)
  • It is difficult to get an account; ie no free trials or demo instances for attackers to play with.
  • Knowing the static URLs somehow leads to the attacker being able to do bad things (this is the core of Tom's answer).
  • And finally: there is no easier way for an attacker to learn the URLs, for example by analyzing your javascript code or links in the public HTML pages. I suppose it's possible that the public pages only have links to other public pages, and that you have separate javascript files for the public and private parts of your app, but I've personally never seen an app built that way.

TL;DR this seems like an odd thing to want to do. I would suggest instead following Kerckhoff's Principle and designing your web app so that it is secure even if an attacker knows everything about its design (HTML, javascript, static URLs, etc).


UPDATE addressing JimmyJames' comment.

If you have dynamic URLs like /users/<username> or /device/<deviceId>, then it makes sense to return a 404 so that the following are indistinguishable:

  • URL does not exist.
  • URL exists but you don't have permission to see it.

However, since your example was www.site.com/message_inbox, I assume you're not talking about dynamic URLs though.

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
  • I'm having a hard time squaring your assertion that this is bizarre with the fact that using 404 instead of 403 is described as a valid response in the RFC and that major players like AWS use a similar approach. You might disagree with this idea but it's not unusual to protect against this. – JimmyJames May 18 '22 at 17:15
  • Examples of websites that do this are reportedly Github and Facebook. – JimmyJames May 18 '22 at 17:25
  • 1
    @JimmyJames I added an update. Does that address your comment, or do you mean that github / facebook protect static URLs also? And if so, why? What are they possibly gaining by doing that? – Mike Ounsworth May 18 '22 at 19:43
  • I'm not aware that they do. I can't say definitively that they don't, though. I tend to agree with you that hiding static paths that are likely discoverable in other ways is probably not very useful but once you start doing this for some paths, inconsistency could be problematic. I also think the concept of static versus dynamic paths is somewhat arbitrary. – JimmyJames May 18 '22 at 19:48
  • Fair. Balance that against user friction though; if you're sitting on `www.site.com/message_inbox` and your session times out and suddenly you're on a 404 page, that's gonna be super confusing. – Mike Ounsworth May 18 '22 at 19:57
  • You are correct that it is a potential pain. With the AWS thing I ran into where they return 403 for non-existent paths, getting a 403 when I really just had the wrong (non-trivial) path was definitely a red herring that cost me some time. It's definitely an interesting conundrum. The only obvious way to sidestep the issue that comes to mind is to use non-meaningful randomly generated paths that can't be easily guessed such as v4 UUIDs. – JimmyJames May 18 '22 at 20:21
  • 1
    BTW: Ironically enough it seems that SE uses this approach: https://meta.stackexchange.com/questions/258756/what-is-the-reason-behind-marking-forbidden-pages-as-404 – JimmyJames May 18 '22 at 20:38
  • 1
    Thanks for the interesting discussion @JimmyJames! – Mike Ounsworth May 18 '22 at 20:49
2

Use status 404.5 - Not Found + Denied by request filtering

You can return HTTP status 404.5 "Denied by request filtering." This is accurate since your site denies any requests to non-public URLs based on a business rule (user must be authenticated). Since it's a 404.x message it also makes sense to serve it for URLs that do not exist.

For the convenience of your legitimate users, you can configure your server to serve a custom page for status 404.5, and include a link to the login page from there. That way the browser is not loading the login page (which could have side effects) arbitrarily for garbage URLs. Only when the user clicks the link would the login page be served. The custom 404.5 page can be static HTML and can be set to cache so it is only loaded by the browser once.

John Wu
  • 9,101
  • 1
  • 28
  • 39
  • Where are you getting these 404.x statuses from? I've seen some other people talking about them, but I can't find a list of their meanings and I doubt this is valid HTTP. – wizzwizz4 May 20 '22 at 08:55
  • THats a subclassification by the IIS server from windows. the Sent statuscode is a 404 for all 404s, the distinction is for choosing what error page to send to the client. – masterX244 May 20 '22 at 11:57
2

We could attempt to mask this by redirecting ALL requests to the login page, except for the public pages. But this becomes silly as all junk requests will happily return HTML.

You are trying to keep the attacker from knowing what private pages are available without authenticating. This means existing and non-existing pages must return identical results. Therefore, you authenticate first, and handle the 404 after.

There's nothing silly about returning HTML for non-existing pages - that's implied by your desired behavior. Some websites do things like put everything that requires auth below a path like example.com/private/... so that the client does not expect to get a 404 right away for things under private/. Moreover, this "silly" problem goes away as soon as you authenticate.

This is a standard pattern in access control. Before you can see what resources are available, you must first authenticate. If you're not authed, you can't distinguish "does not exist" from "not allowed to access".

Jessica
  • 121
  • 2
-1

The usual thing I have seen is to reload the login page once the session expires. Like how your bank does "you've been logged out due to inactivity". This prevents your issue with 404* on refresh.

  • don't return 404 for invalid credentials, it's confusing and may cause the browser to remove a bookmark or something. Return 301 and redirect them to the login page, or 401 unauthorized.
Sufferer
  • 19
  • 1
  • "401 Unauthorized" is supposed to trigger HTTP authentication. Using it for any other purpose is likely to cause more security issues than it solves. – Mark May 17 '22 at 23:48
  • 1
    Mark is correct about 401. You probably mean 403, "Forbidden", which has a similar name but a different purpose. – Radvylf Programs May 18 '22 at 02:10
-3

The solution to this is to redirect anything that is not a public resource to the login URL, including nonexistent pages.

Simon Richter
  • 1,482
  • 11
  • 8