27

I've been reading up about rainbow tables as I think they're quite interesting cause they're actually a pretty simple concept.

Anyway, I was wondering, has anyone been involved in actually generating one? How is it possibly done? I just don't see how it is actually possible to generate every combination of every character.

If we exclude special characters, there are these amount of characters.

ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 1234567890

That sounds like there is 26+26+10 = 62 characters

That means that a password of length 8 has 62^8 combinations.

which is equal to 218340105584896

That alone sounds like it would take ages to generate. What about when we increase the number of characters to 12 and add in the special characters (say there is another 10 of them just by look at looking at the characters that you get by pressing shift - number)?

We would get 72^12 = 19408409961765342806016

thats a number that's big enough to take years to get.

AviD
  • 72,138
  • 22
  • 136
  • 218
stickman
  • 1,550
  • 3
  • 13
  • 16
  • 2
    The answers here nicely illustrate why just salting a password is not enough. Once the attacker knows the salt, in an hour or less they can brute force just that password. You also need iterations (aka strengthening, stretching) to make it take a long time to hash each password. See [how to safely store a password](http://codahale.com/how-to-safely-store-a-password/) And this all ignores software which is good about picking passwords that are more likely, perhaps given various things known about the user from social networking etc. – nealmcb Apr 30 '11 at 14:42
  • 2
    Not so simple: see [here](http://security.stackexchange.com/q/379/33) what rainbow tables really are. – AviD Apr 30 '11 at 22:47

4 Answers4

30

A rainbow table is "just" a compact representation of a table of precomputed hash values. During the construction of the rainbow table, many possible inputs are tried and hashed. Each input which has been encountered during table construction will be successfully attacked with that table, and none other. The hash evaluation concentrates most of the table building cost.

So, basically, the cost of building a rainbow table which can invert N passwords is roughly equivalent to the cost of trying those N passwords through the hash function -- the point of the rainbow table being that you build it once and then can use it to break several passwords. (To be precise, due to chain collisions during table construction, the cost is in fact closer to 1.7*N, but let's ignore that for the moment.)

I have once made some experiences with SHA-1. A simple password hash with SHA-1 has the cost of processing a single "block" (SHA-1, like MD5, processes data by 64-byte blocks), which needs about 900 32-bit logical-or-arithmetic operations. An optimized implementation on an Intel Core2 x86 processor can do that in about 500 clock cycles. However, password attacks (whether directly or for rainbow table construction, it does not matter) are a highly parallel job, so one could use the SSE2 instructions which offer 128-bit registers, and where a single opcode can perform four 32-bit operations simultaneously. SSE2 has less kinds of operations available (in particular, it does not offer rotations, only shifts), so the operation count rises to about 1200; but, under some conditions, the SSE2 unit will execute several opcodes simultaneously. So we end up with 800 clock cycles, for four SHA-1 instances in parallel. Bottom-line: my PC is an Intel Core2 Q6600, with four cores running at 2.4 GHz. Each core can run my SSE2 implementation, resulting in roughly 48 millions hashed passwords per second.

I also have a not-too-small Nvidia graphics card, and the GPU can run arbitrary code through CUDA. This is a 9800 GTX+, with 128 cores running at 1.84 GHz. Each core may execute one 32-bit operation per cycle (there is a high latency, but, thanks to the high parallelization, this one-instruction-per-cycle throughput can be maintained). The cores do not know about rotations, so each code will use 1200 clock cycles per hashed password. The total performance is 160 millions hashed passwords per second.

My PC and graphics card are from early 2009 and are not top-of-line. One can find nowadays, for a few hundred dollars, a GPU which will hash passwords about three times faster than my 9800 GTX+. So let's assume that an attacker with a common PC (which costs less than 1000$) can hash half a billion passwords per second.

At that speed, all passwords with 8 alphanumeric characters (uppercase and lowercase letters, and digits) are gone through in about 5 days. With a 1000$ PC. If you use MD5, things are about 30% faster (MD5 uses a bit less operations than SHA-1). Good password hashing schemes do not use a simple hash invocation, though: they use iterated hashing with, say, 2000 nested hash invocations: this multiplies cost for the attacker by the same 2000 factor (so it turns the "5 days" into about 28 years, literally "ages" as you put it).

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • a lot of the hardware stuff went over my head. Just a question, 28 years for one computer - I assume this can be divided amongst a number of PCs, so if you have say, 28 PCs, would you be able to do it within a year? 28 PCs isn't much, most companies would have 28 PCs lying around somewhere. Does that mean then that anyone with a couple of thousand PCs at their disposal can crack any password within a few days? –  Apr 29 '11 at 15:10
  • 3
    @stickman: yes, that's what _parallel_ means: throw in twice the hardware, and you go twice faster. And yes, someone with a thousand PC has high password-cracking power (especially students: they have free time, weird ideas about what is a worthwhile way to use said free time, and access to university machines). However, hash iterations that good password hashing schemes employ are quite effective. – Thomas Pornin Apr 29 '11 at 15:19
  • can we find somewhere on the net a prefabricated rainbow table ? – Phoenician-Eagle Apr 30 '11 at 09:26
  • @big-bird-from-middle-east: a simple google search on "rainbow table" points to http://project-rainbowcrack.com/table.htm so I think that the answer is "yes". – Thomas Pornin Apr 30 '11 at 17:41
  • merci beaucoup! – Phoenician-Eagle Apr 30 '11 at 20:29
  • For an update: Zorinaq's [whitepixel - GPU-accelerated password hash auditing](http://whitepixel.zorinaq.com/) is reported to be the fastest in public benchmarks and does 33 billion MD5's a second on a 4 x AMD Radeon HD 5970 GPU ([Whitepixel v2](http://blog.zorinaq.com/?e=43)), and he estimated 10 billion SHA-1's/sec when he gets that implemented, and has 1.1 billion SHA-256/sec implemented in unreleased code. – nealmcb Jun 26 '11 at 20:12
18

How long does it take to generate a rainbow table for a very simple hash, using just a single iteration? It takes one hour! Or less, if you want it to.

While the answers above are entirely correct, there is one important development that they're not mentioning. Amazon EC2 and other 'cloud computing' server providers.

Today everybody with a credit card can go to aws.amazon.com and spool up a handful of EC2 spot instances, for less than a hundred bucks. Or if you have good CUDA code available, rent 50 of Amazon's more expensive "Cluster GPU" instances with two NVIDIA Tesla M2050 graphics processors.

(Somewhat like the airlines, Amazon has differentiated pricing. If you need a specific EC2 server with guaranteed availability, pricing is higher. For example, you can get a "Hi-CPU Large" instance with 8 virtual CPU cores for 0.68 USD per hour. If you're willing to buy the same instance as excess supply permits in the off hours, then you can get it with 40%-50% rebate.)

Creating rainbow tables can be done in parallel with a linear performance increase, i.e. with 100 computers working it is 100 times faster than a single computer.

Amazon doesn't charge you per instance, it charges per instance-hour. Thus running 1,000 servers for one hour costs the same as one server for 1,000 hours.

The parallel nature of rainbow table creation, taken together with services like Amazon EC2, means that there is no "how long does it take" question anymore. There is a "how much are you willing to pay to get it fast, versus paying less and getting it in a few days?". The difference in cost & time mainly comes from Amazon's price differentiation between 'regular' EC2 instances versus the cheaper 'spot instances'.

6

Using a simple 26^5 test run, I was able to generate an ascii hex table in Ruby that generated 228488 MD5 outputs per second. All 11881376 entries took 52 seconds on my 1.5-year old Core i7. If I didn't make any improvements to my program, I'd expect it to run for 26 weeks to generate your 62^8 list.

I could probably make improvements to my program by running eight separate programs, one per hyperthread, to partition the problem space. If each program saved output to its own drive, they wouldn't compete for IO bandwidth. I would expect a factor of four to six speedup compared to my stupid single-threaded program for 4-6 weeks runtime. If I were to re-write in C, I might expect another 1.5 speedup factor. (Maybe more? On the one hand it's a simple program, on the other hand, a C array 13 bytes long, modified at runtime, is probably going to run in much less memory and of course no garbage collection compared to creating and destroying the Ruby string objects.)

I'd expect an afternoon's work and a few hundred dollars of new gear would let me finish the 62^8 tables two weeks on commodity hardware.

And sure, no one uses single-run MD5 on passwords; just scale my end results by however much more expensive your one-way hash is than MD5. :)

sarnold
  • 721
  • 4
  • 7
  • 4
    "no one uses single-run MD5 on passwords" if than only was true. – Jacco Jun 14 '11 at 10:09
  • LOL, Its 2019 and I am looking at at code right now written in 2015 that well... does a single MD5 hash. You'd be surprised. – Alexander Higgins May 29 '19 at 17:54
  • @AlexanderHiggins, I hope you get to switch to argon2 or similar modern format as part of your work. (But hey, it could always be worse, you could be cleaning up after https://krebsonsecurity.com/2019/03/facebook-stored-hundreds-of-millions-of-user-passwords-in-plain-text-for-years/ ). – sarnold May 30 '19 at 18:49
  • Agreed. I came to this link to get backing material showing why single MD5 is bad. BTW, Loved this part of that article `My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.` – Alexander Higgins May 30 '19 at 23:09
  • 1
    @AlexanderHiggins, hopefully these hashcat timings are useful for you: https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a270c40 -- 30 billion hash candidates per second on a single commodity GPU. – sarnold May 31 '19 at 00:39
1

See my bitcoin mining network hash rate calculations at How to securely hash passwords? - IT Security for what determined people with more modern GPU cards can accomplish in terms of raw hashing. The community is running at 11 Thash/s (11 * 10^12 hash/s) in early July 2011....

nealmcb
  • 20,544
  • 6
  • 69
  • 116