4

I recently stumbled upon this and am checking here to see if what I am proposing is indeed feasible and can be considered a breach of privacy.

For obvious reasons I am not revealing the website which exhibits this property

The URLs are of the format :

https://xxxxxxxxyyyyyzzzz/xyz/<6 digit rand>_<17 digit rand>_<10 digit rand>_n.jpg

And requesting the above link will return you an image. Now, as you can see, the entropy of the possible URLs are quite large. But note that they are all integers(0-9).

This website hosts contents of millions of people ;) and my guess is that at least 10% of the URLs contained within these random number will work. Of course, its just a guess.

My question is : is this feasible ? Is my Claim true ? My presumption here is that these random numbers may be a non-cryptographic hash of some string. There is no way to confirm the above sentence. For the sake of this question, lets assume it does.

My code to generate these links looks like so (just a snippet)

first = str(random.randint(100000,999999))
second=str(random.randint(10000,99999))+str(random.randint(10000,99999))+str(random.randint(10000,99999))+str(random.randint(10,99))
third= str(random.randint(10000,99999))+str(random.randint(10000,99999))

test='https://<URL>/'+first+'_'+second+'_'+third+'_n.jpg'
try:
        image=urllib2.urlopen(test)
        print (image.read()).__len__()
except:
        print "fail"

I have not tried to run this for more than tens of requests for the fear of my IP being blocked by the server for excessive requests. I do not intend to either. Just want to clarify if my understanding is right.

P.S: Am not a python developer, so please forgive if my code is ugly (suggestions will be happily taken to improve)

sudhacker
  • 4,260
  • 5
  • 23
  • 34

1 Answers1

5

No, I think your math is off. I do not think this is actually a real vulnerability. If your description of the format is accurate, and if the numbers generated via a truly random source -- then no, I don't think it is so easy to stumble across other people's pictures.

Let's work the math. There are 33 random digits in the URL. That means there are 1033 possible URLs. Suppose this web site has one billion users (109), and each user posts 1000 pictures (103). (I'm being generous here.) Then there would be 1012 pictures posted in all. This means that accessing a randomly chosen URL would have a 1012/1033 = 1 in 1021.

In other words, you will have to try about one thousand billion billion (1021) times before stumbling on a single picture from someone else. That would take approximately, oh, forever.

Oh, you want a more precise estimate? OK, OK, here goes. Suppose you can make 1000 requests per second (probably a generous estimate, but let's run with it). Then it would take you 1018 seconds before you stumble across the first picture, with just random guessing. There are about pi times 107 seconds in a year, so it will take you about 3 * 1010 years before your first success. That's longer than the known lifetime of the universe. (By the time you stumble across your first picture, everyone shown in the picture will have been long since dead, and they won't care any more.)

So, no, this attack is not a threat. As long as the random numbers in the URL are truly random and unpredictable, this scheme is secure.


The biggest risk comes if the numbers in the are not actually truly cryptographically random. If the numbers are generated using a non-crypto-strength pseudorandom number generator, or via some other predictable sequence, then the scheme could be vulnerable.

Ironically, your example code would is a good example of how not to do it. You used Python's built-in random generator. That is not cryptographic-strength, and thus its output is likely to be predictable. The security is at best as good as the amount of entropy in the seed to the pseudorandom generator. Even worse, with many such pseudorandom number generators, if you observe a few outputs from the generator, you can predict all following outputs -- which would be deadly to the security of such an image-hosting scheme.

Is your website using a vulnerable pseudorandom number generator? If they know what they are doing, I sure hope not. However, you probably have no good way to know for sure from the outside.

To learn more about the subject, I'd like to refer you to two resources: first, Make sure you seed random number generators with enough entropy; and second, this Dilbert cartoon:

Dilbert on randomness

D.W.
  • 98,420
  • 30
  • 267
  • 572
  • Completely agree with you, mathematically. But practically, isn't there a change I might get few pictures for say a 10^7 tries, which seems to be a practicable number to me ? Given that, my attack also generates random. In other words, what if I am lucky ? – sudhacker Sep 24 '12 at 01:48
  • If you're *that* lucky, go buy a lottery ticket! Is there a *chance* of success after only 10^7 tries? Sure, there's always a chance. Is there a chance that I win the lottery each day this week, one right after another? Sure, there's a chance: it just ain't damn likely. My point is, it's not enough to ask whether there's a chance; you need to calculate the magnitude of that chance. I've given you all the tools to do the calculation of your likelihood of success after 10^7 tries. Now I'm going to challenge you to do the calculation yourself! Try it: it'll be a good learning exercise. – D.W. Sep 24 '12 at 02:01
  • After 10^4 or 10^5 tries, they're likely to block you from the servers anyway. – Polynomial Sep 24 '12 at 06:04
  • Oh, and +1 for the Dilbert randomness. – Polynomial Sep 24 '12 at 06:06
  • 3
    @Polynomial The server could also still accept your requests, but automatically return "no image" regardless of what you passed it, making your probability of success effectively zero, with no way for you of knowing (not very useful in this situation but can be a very powerful deterrent in some situations, in particular people trying to guess passwords) – Thomas Sep 24 '12 at 06:36
  • 1
    Yeah, by "block" I meant they'd prevent you from actually using the service, rather than something as drastic as `iptables -A INPUT -s -j DROP` – Polynomial Sep 24 '12 at 08:16