80

I would like to move from sequential to random user IDs, so I can host profile photos publicly, i.e. example.com/profilepics/asdf-1234-zxcv-7890.jpg.

How long must user IDs be to keep anyone from finding any user photos for which they have not been given the link?

Does 16 lowercase letters and zero through nine provide a reasonable complexity? I'm basing this on 3616 = 8x1024, conservatively estimate 10 billion user accounts reduces the space to 8x1014. At 1000 guesses/second, it would take 25 000 years to find an account. Unless I'm overlooking something.

Trang Oul
  • 124
  • 1
  • 8
owenfi
  • 903
  • 1
  • 6
  • 8
  • Well AWS does that kind of thing with user files (128bit random name if I recall correctly), so you can assume it is "safe enough", as shown in practice. – Damon May 19 '14 at 10:24
  • @owenfi Generate 128bit+ values and you're alright. For your specific case and application, that's more than good enough. – Adi May 19 '14 at 10:27
  • 22
    It is meaningless anyway. Passwords, tokens, certificates, what security exactly isn't by obscurity? The saying when used correctly applies to the method not the secret. – JamesRyan May 19 '14 at 11:28
  • 3
    This question is equivalent to "How should I choose a secure password?". I don't see any difference at all. – usr May 19 '14 at 13:03
  • This is the same thing as the dropbox vulnerability from last week (https://security.stackexchange.com/questions/57436/what-was-the-security-vulnerability-behind-box-and-dropbox-and-whats-different/57437#57437) - these are public urls, they can be crawled and will be indexed, user's may share/post their own links, etc. – Eric G May 20 '14 at 02:55
  • You're misinterpreting the phrase "security through obscurity". The phrase is used to describe when the _algorithms_ used to secure a resource are not publicly disclosed, giving the false impression that they are secure because no one outside, e.g., the company knows how said algorithms work. Having a secret key with known origin and application (say, a random URL) in the algorithm is not security through obscurity. All security relies on shared secrets; though, URL's might be a bad idea because of the number of ways they can leak. – Cookyt May 21 '14 at 02:37
  • @damon I'm looking for a reference on Amazon's website for the 128bit random name URLs, could you please provide a link? Yours is the only reference I've been able to find so far after some Googling and searching – cutrightjm Sep 06 '16 at 01:38

7 Answers7

78

It depends entirely on what you mean by "safe".

If your only concern is an attacker guessing URLs, then 16 alphanumerics gives roughly 8,000,000,000,000,000,000,000,000 possible addresses, which is plenty to stop random guessing -- in order for an attacker to have a 50% chance of finding even one picture on a site with a thousand users in a year, they'd need to make 100 trillion tries per second, enough traffic to bring down even something like Amazon or Google.

But there are other ways for URLs to leak: people putting them in emails or blog posts, web crawlers finding pages you didn't secure adequately, and so on. If you really need to protect something, you need to put it behind the same sort of security as the rest of your website.

Personally, for making hard-to-guess URLs, I'd use GUIDs/UUIDs. The search space is absurdly huge, you don't need to coordinate generation between multiple servers, and most languages have standard routines for handling them.

Mark
  • 34,390
  • 9
  • 85
  • 134
  • 25
    Most GUID generators do not guarantee unpredictable outputs. So you should generate 16 random bytes from a CSPRNG instead of using standard GUID generators. – CodesInChaos May 19 '14 at 10:10
  • Then that's a buggy GUID/UUID generator... – R.. GitHub STOP HELPING ICE May 19 '14 at 12:40
  • 12
    @R.. not necessarily, that depends on your [GUID](https://en.wikipedia.org/wiki/Globally_unique_identifier#Sequential_algorithms) or [UUID implementation](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_1_.28MAC_address.29). However, I think *most* these days are probably random, but maybe not *secure* random (so possibly predictable). The thing to remember is that the goal is "unique", not "unpredictable", so good implementations satisfy "unique", and only maybe are also "unpredictable". – Tim S. May 19 '14 at 13:43
  • I would argue that it's impossible to satisfy the "unique" criterion without also satisfying "unpredictable" since another "UUID" generator, outside your control, on another system could end up generating an identical UUID (thus making it non-unique) with non-negligible probability, if it knows your generating algorithm and chooses to attempt to collide with it maliciously. A UUID is only "UU" (and even still, only statistically so) if it is unpredictable. – R.. GitHub STOP HELPING ICE May 19 '14 at 17:08
  • 4
    @R.. UUIDs are not cryptographic entities, they are only unique as far as everyone agree to make them so. The good thing is, for any application that use UUIDs as intended it doesn't matter if someone else copies or otherwise reuse the UUIDs, it would only be a problem if the two systems then has to share data. If you have to share data with someone who intentionally tries to break the UUID system, you have probably got bigger problems. – aaaaaaaaaaaa May 19 '14 at 17:33
  • 27
    @R.. You are incorrect; GUIDs are documented as a source of *uniqueness*, not *randomness*. A type 4 GUID is random, but that randomness is not guaranteed to be crypto strength and in practice it is not. More generally: **GUIDs were not designed to be a part of any security system; use of them in a security system is an "off label" use and is therefore a very bad idea**. – Eric Lippert May 19 '14 at 17:43
  • 4
    @R..: The GUID system is deliberately **not** designed to be resistant to malicious usage. It's like a stop sign; a stop sign does not *compel* you to stop. If someone wants to ignore it and blow through an intersection because they're malicious and enjoy causing accidents, the stop sign isn't going to stop them. GUIDs are a *cooperative* system for ensuring uniqueness; all parties are presumed to be non-hostile. – Eric Lippert May 19 '14 at 17:45
  • Best solution is to use a _unique_ GUID generator service combined with a fixed-length _random_ string, then take the output of a strong hash function, and use that as your unique object reference to that object. ie: `sha256(guid() + random_chars(32)).hexdigest()`. – Naftuli Kay May 20 '14 at 00:28
  • 1
    @NaftuliTzviKay That doesn't do anything beyond shuffling the data around a bit, you need the random string to be cryptographic random, and if it is so feeding it though a hash algorithm doesn't make it any better. Just use the cryptographic random number as your CSRNG deliver it, throwing a GUID into the mix doesn't serve any purpose. – aaaaaaaaaaaa May 20 '14 at 19:35
  • The GUID provides uniqueness, while the random string provides randomness. Both are critical for the solution to OP's question. – Naftuli Kay May 20 '14 at 20:14
  • @NaftuliTzviKay: And then the hash eliminates uniqueness. (Hash collision has a low but non-zero probability, just like crypto-strength PRNG outputs) – Ben Voigt May 20 '14 at 21:15
  • @BenVoigt: UUIDs are not guaranteed to be unique! I do not think that the "uniqueness" of the ID proposed by Naftuli Tzvi Kay would be (considerably) worse than the uniqueness of UUIDs. Some types of UUIDs contain MD5 or SHA-1 has of a MAC address or a domain name. – pabouk - Ukraine stay strong May 21 '14 at 08:25
  • @basic there are lots of different kinds of uuids and no there's absolutely no reason for the attacker having to know your system for any of those to be vulnerable - particularly not for type 4 UUIDs... – Voo May 21 '14 at 10:26
  • @Basic If the problem is "make URLs unguessable" then no there's no much bigger problem than "system makes URLs trivially guessable" (otherwise why not argue that sequential numbers are fine too?). And yes some UUIDs use parts of a MAC address that obviously helps exactly until the attacker can get a single GUID (which is pretty much guaranteed), afterwards it's a simple timestamp they have to guess. Your claim that it would take an "attacker with no knowledge of your system" years to find a collision is just plain wrong. – Voo May 21 '14 at 13:17
  • @pabouk: A UUID generator is guaranteed to never return the same value twice, no matter how many times it is called or by whom. You certainly can make a bitwise copy of a UUID and create a collision that way, or modify a generator's internal state to break the guarantee, but those aren't germane to this discussion. – Ben Voigt May 21 '14 at 14:45
  • @BenVoigt: No, really, they are not guaranteed to be unique. In principle this is not possible if you have multiple discrete UUID generators which do not start from a unique identifier. For example MAC addresses should be unique but in fact they are not. See: http://stackoverflow.com/q/1155008/320437 – pabouk - Ukraine stay strong May 21 '14 at 16:56
  • @Basic Umn.. have you actually checked how GUIDs/UUIDs are generated? That math assumes that all bits are randomly generated by a secure random generator which is completely wrong. That guy is basically computing the chances for a collision assuming a non-malicious hacker, which is completely different. For example type 4 UUIDs are generated usually using a non-secure random generator. After seeing some generated UUIDs it's then very, very simple to guess the sequence. The GUIDs based on a MAC are generated using a *timestamp* - you really don't see how easy it is to guess possible values? – Voo May 21 '14 at 22:52
  • @Voo I feel like I'm in a circular conversation here... Let's look at my first comment "Taking an attacker with no knowledge of your system, they could generate GUIDs for years and not collide with one you've created". The key part here being "No Knowledge". As I said _at the start_ If they know _when_ and _on what_ it was generated, it's a different matter. Since you think I'm wrong and I think you're repeating dogma without understanding, shall we agree to disagree and remove all these comments which are adding value to nobody? – Basic May 22 '14 at 07:08
  • 2
    @Basic The definition of a secure cryptosystem hasn't changed since the 19th century (Kerckhoff's principle). It is secure if everything but the key is public knowledge. The key in our case is the URL. If I can increase my chances of getting to a file immensely just by having some good guess about when it was created (or creating a few files myself and observing the pattern to figure out the seed of the non-secure PNRG, then I can iterate through all files in the system!), the system is not secure. I just don't see the point of contention here. – Voo May 23 '14 at 12:25
  • @Voo And I don't see that us rehashing the same sentences repeatedly is in any way adding value to this answer... If you want to discuss it, my email address is on my profile. [xkcd](http://xkcd.com/386/) – Basic May 23 '14 at 12:50
27

Since you already brought up dropbox, I think we can give at least one reason why doing this is a bad idea:

Dropbox disables old shared links after tax returns end up on Google

The flaw, which is reportedly also present on Box, impacts shared files that contain hyperlinks. "Dropbox users can share links to any file or folder in their Dropbox," the company noted yesterday while confirming the vulnerability:

Files shared via links are only accessible to people who have the link. However, shared links to documents can be inadvertently disclosed to unintended recipients in the following scenario:

  • A Dropbox user shares a link to a document that contains a hyperlink to a third-party website.
  • The user, or an authorized recipient of the link, clicks on a hyperlink in the document.
  • At that point, the referrer header discloses the original shared link to the third-party website.
  • Someone with access to that header, such as the webmaster of the third-party website, could then access the link to the shared document.

Basically it's way too easy for URLs to leak inadvertently considering how many users use them. If your users are educated about this and avoid these problems I guess it's reasonably safe, but that's a big assumption to make.

Dario Seidl
  • 269
  • 2
  • 9
Voo
  • 651
  • 5
  • 14
  • 1
    Thanks, this a very good point to bear in mind for this topic. In my case it should only be consumed through an app privately, so it is very unlikely the average user will get a hold of the URL to share it in the first place (short of reverse engineering the API). – owenfi May 19 '14 at 19:56
  • @owenfi In that case I think we can consider this reasonably safe assuming you use a large enough address space and a secure random algorithm to create the URL. – Voo May 19 '14 at 20:53
  • 1
    The article Voo cited contained a link to another article even more appropriate to the question: http://www.theregister.co.uk/2011/05/08/file_hosting_sites_under_attack/ "In 2011, researchers found that it was possible to access shared files by guessing the URLs." – WGroleau May 20 '14 at 02:10
  • 1
    @WGroleau It should be pointed out that in that link all the given examples are of systems that are in one way or another broken. None of the broken systems used a large search space combined with a secure random generator. I mean really.. sequential URLs? So I find that less applicable than the given link which shows an inherent flaw in the system (for some uses only) and not just problems with broken implementations. Although it shows that people will try to exploit such a system so better make sure it's sound! – Voo May 20 '14 at 20:42
26

Maybe not the answer to your question, but if you would like to "hide" the location of your profile pictures on a website, you could just embed the image as data URIs. You can base64 encode the image on your server and embed the string on your website, instead of exposing any image paths.

see http://css-tricks.com/data-uris/ and http://css-tricks.com/examples/DataURIs/ for a description and demo.

iHaveacomputer
  • 523
  • 3
  • 6
  • 5
    @FaridNouriNeshat, Data URIs aren't limited to 2000 characters the way http URIs are. If you need to support Internet Explorer 8 or earlier, [the limit is 32k](https://en.wikipedia.org/wiki/Data_URI#Web_browser_support); if you can require IE9 or later, there is no limit. – Mark May 19 '14 at 10:20
  • @Mark You're right. Sorry, I was confusing this with another method. – Farid Nouri Neshat May 19 '14 at 10:46
  • 18
    This has a problem: Bandwidth. Caches cannot work here, so you end up sending the same 30KiB image quite more often, which translates into money and performance. – Darkhogg May 19 '14 at 10:47
  • Holy smokes these are cool when called for! – owenfi May 19 '14 at 21:05
13

The other answers are generally good, but another consideration is the transport. If you're using plain http or any other non-encrypted protocol (or sending the urls via email), all data you transmit and receive, including these urls, should be considered completely public from a security standpoint. A large portion (anyone have stats?) of users are on public wifi access points with no encryption and active url/image scraping of such networks is common.

7

As mentioned by others, a URI for a specific image will leak out sooner or later, no matter how long or convoluted it is. If you are willing to restrict viewing to logged-in users, you could use, say, .../image/profile.php?u=12345 to display user 12345's image without a direct URI to the photo being available to pass around to the general public. It is assumed that random people (not logged in) would get nothing back from profile.php. Note that nothing prevents a logged-in user from saving that image (especially if it's cached) and passing it around. There are things that might be done with cache control headers, etc., or putting the image in Flash, or whatever, but if an image is viewable on someone's browser, with enough work it will be possible to grab and save it.

Phil Perry
  • 91
  • 1
5

The problem with your scheme is that the numerous users of your URLs will probably not all guess that these contain sensitive information. And part of that problem is that you likely have no idea just how big the user base is; for these URLs, the users include

  1. The people you think of as users.

  2. Their browser plugins/addons/extensions.

  3. Just about any third-party content on your site (ads, analytics, social plugins, ...) will likely, one way or another, inform third-parties of the URLs in question.

  4. Seemingly random websites seeing the URL as referrer URL (do you really know what curious extra links your users conjure into your web site through browser addons?).

Empirical evidence is that https-only URLs advertised as password-equivalent get indexed by Google, repeatedly, e.g. in the case of the password-free Bitcoin online wallet Instawallet (note they have gone so bankcrupt over this that they don't even afford themselves a valid SSL certificate anymore).

  • Thanks, some really good edge cases pointed out here. Looks like before rolling this out on a website I need to put it behind auth. (For now it's in an app-api only, so the 3rd parties will be limited somewhat to networking modules, and maybe 3rd party tools observing my documents directory.) – owenfi May 21 '14 at 01:56
  • 1
    People might even enter the whole URL into the Google search bar, and therefore disclosing the URL to Google. – Martin Ueding May 21 '14 at 12:52
2

Adding a couple of important points as everyone answered above seem to have missed the some of them.

  1. The likelihood of inadvertently exposing a URL is higher than exposing a password as people are aware that password is sensitive.
  2. Facebook like websites use CDN URLs which are so complex that no one can guess, but yet from a security stand point they seem to be risky as revocation of access is impossible when the user changes the privacy settings. Some websites, including the ones with amazon web service s3 storage in the backend uses a signed URL with a timestamp which will be validated periodically.
  3. Google Cache! Search engines are likely to crawl through the supposedly private images. Do a search with dorks which would bring back only the results from Facebook CDNs and you will be amazed.
hax
  • 3,851
  • 1
  • 16
  • 34
  • 1
    I especially like point 2. This was used in now defunct 'friend' system for profile images. The access removed perspective is a good one to consider (I guess it's not much different than somebody saving the image). – owenfi May 16 '17 at 20:15
  • @owenfi I agree with that. Not much different than saving the image, but likelihood of getting the details from browser history, cache etc is there. – hax May 24 '17 at 19:31