How does a 'rainbow table' hacker obtain password hashes in the first place?

Question

I don't understand this part of the Rainbow table attack.

In all my Google searches, it says that a hacker uses a rainbow table on password hashes.

But how does the hacker obtain the password hashes in the first place?

I have rephrased this question from a previous question which was closed: How is Salting a password considered secure, when the hacker already has access to user password Database?

If the hacker already has the password hashes, can't he just use them to hack the system?

What do you mean by "hack the system"? You mean, log into the accounts that the password hash represents? — schroeder, Jun 22 '20 at 08:57
How does the attacker get the password database? A million different ways due to lack of security. SQLi from the web, poor passwords that protect the database itself, social engineering the database administrator, insecured database backups, etc. etc. — schroeder, Jun 22 '20 at 08:59
*"..., it says that a hacker uses a rainbow table on password hashes."* - the logic is a bit different. Hackers prefer bad (not or improperly salted) password hashes since they can use rainbow tables on these for fast cracking. And there are still enough of these leaked databases out there. With proper salting and hashing such password databases are much harder to crack and here no rainbow tables can be used. Instead comparably slow brute-forcing with the most common passwords will be used. — Steffen Ullrich, Jun 22 '20 at 09:24

score 29 · Accepted Answer · edited Jun 23 '20 at 14:52

29

The news is full of examples of leaked databases (this is just the most recent results).

The How:

The vast majority of cases involve unsecured databases/backups (across pretty much all technologies: S3, mongodb, cassandra, mysql, etc....). These are usually due to configuration errors, bad defaults, or carelessness.

What data is leaked:

These generally provide at least read-only access to some or all of the data contained in the database, including usernames and hashed-and-salted passwords.

These dumps include a lot of private user records. Plaintext passwords (or using a simple hash such as md5) are even more problematic because that data can be used in credential stuffing attacks (by trying the same username/password combinations on different websites), potentially accessing even more data.

What to do with a password hash:

If an attacker has access to a hashed and salted password, they cannot just provide this to the server to authenticate. At login time, the server computes hash(salt + plaintext_password) and compares it with the value stored in the database. If the attacker attempts to use the hash, the server will just compute hash(salt + incoming_hash), resulting in a wrong value.

One scenario that could spell a lot of trouble is client-side-only password hashing. If the client computes and sends hash(salt + plaintext_password) into the login endpoint, then the stored hash can be used to login. This alone shows how dangerous that is to do. There are some algorithms that offload some of the work to the client (such as SCRAM) but they involve a more thorough client-server exchange to prevent exactly this scenario.

Password storage security is worried about attackers deriving the real password from the stored value. It is not concerned with other vectors of attack against the server.

edited Jun 23 '20 at 14:52

thesquaregroot

203
1
8

answered Jun 22 '20 at 08:55

Marc

4,091
1
17
23

2

Thanks for the answer. It's very clear.... but one thing I'm confused. Do you mean that when a user logs via a front end, the front end sends the password in Plaintext to the back end??? That doesn't sound secure – user1034912 Jun 22 '20 at 08:59
12

The frontend should be communicating with the server over TLS to provide privacy and data integrity. Some frontends send a simple hash of the password, just to avoid accidental leaks. But the frontend (usually) does not compute the salted hash. There are other authentication methods out there with more client involvement (eg: [scram-sha](https://tools.ietf.org/html/rfc7677)), but still won't let you use the leaked hash easily. – Marc Jun 22 '20 at 09:03
would it be worth updating this answer with passing the hash? its a little skewed from the OPs question but might address the confusion of just "hacking the system" using only the hash – TheHidden Jun 22 '20 at 11:08
@TheHidden: sure thing, done. – Marc Jun 22 '20 at 11:11
@user1034912 see https://security.stackexchange.com/q/23006/90657 – multithr3at3d Jun 22 '20 at 13:15
Re "*But the frontend (usually) does not compute the salted hash.*",Because doing so would offer almost no additional protection over sending the password in plain text. If protection from eavesdropping is needed for one, it's needed for the other. (This can be done by using a secure channel such as TLS, by using a challenge system, etc) – ikegami Jun 23 '20 at 05:35
1

An additional protection if the frontend sent the hash and not the password is that even if the server itself becomes malicious, then the server operator won't be able to get the true passwords from the user and use them on other services. – Petr Hudeček Jun 23 '20 at 06:54
You've been talking about hashes and salt, but you forgot the pepper (https://en.wikipedia.org/wiki/Pepper_(cryptography)). – Ismael Miguel Jun 23 '20 at 11:32

Pedro · Answer 2 · 2020-06-22T09:44:12.687

If the hacker already has the password hashes, can't he just use them to hack the system?

Unless you're talking about NTLM hashes on windows environments (under certain conditions), the attacker would need to crack them. Not all systems permit using encrypted hashes for authentication.

Performing cryptanalysis against a hashed password consists of generating sequences of characters, hashing them using the same method and comparing the results (you might need to use other pieces of information like usernames for using as salt in the calculation). It's that simple. And also very inefficient, by design(1).

You could employ a brute forcing method whereby you try all possible combinations of the character set you choose (e.g. alphanumeric, alpha+symbols, etc) up to whichever length you'd be willing to go. This guarantees you will find the password, given enough computational effort. It can take centuries to go through a large enough character set with a long enough length with a given hash method;
Or you could use a hybrid mode by selecting words out of a dictionary and generating a sequence of variations against those words as candidate passwords. This is hugely more efficient but there's no guarantee that the password will be found;

A rainbow table is a method whereby you pre-calculate tables of plaintext to hash (possibly with salting(2)). Given a hash you want to crack, you just lookup the plain text password. It's virtually instant. It's a trade-off where you spend your computational time ahead of the cracking moment to the expense of storage. The complication is that the rainbow tables will also take a long time to build(2) and will take a significant amount of storage space (GB to TBs, but there is no real limit).

(1) hashing algorithms are effective if they require a significant amount of computational power to calculate, meaning that one calculation (at login) is relatively cheap, but a large volume of calculations will take a long time to do, hence reducing the effectiveness of brute forcing;

(2) If salting is involved in the hashing algorithm (as it normally is and rightly so), rainbow tables based cryptanalysis loses efficiency since you'd need one table per salt element. Since often usernames are used as salt, you'd need to generate a table per username... if you know those in advance. There's still use to this such as keeping pre-calculated tables for "Administrator" accounts;

@user1034912 : IIRC, everything (is wrong with NTLM security). e.g. the password hashes are password-equivalent; a client can send a hash to the server and log in with it, because it allows client-side hashing or something equivalent. And they're unsalted. Wikipedia has the details. https://en.wikipedia.org/wiki/NT_LAN_Manager#:~:text=The%20NTLM%20protocol%20uses%20one,without%20knowing%20the%20actual%20password. — Peter Cordes, Jun 24 '20 at 03:07
On a windows environment, the NTLM hash itself can be used in certain conditions as authentication on its own, does not need cracking. This is a feature that Microsoft implemented to provide a kind of single-sign-on functionality. It still works. — Pedro, Jun 24 '20 at 08:57

score 6 · Answer 3 · edited Jun 23 '20 at 13:47

Well, the first thing is... what is a Rainbow Table?

A Rainbow Table is a list of the hashed values for the most common X# of passwords. 'Password', 'Password123', 'baseball', 'batman1', etc, etc - hash them all with the hash algorithm the target systems uses.

Then, check whether any column in the compromised SQL table matches any entry in the Rainbow table. Entry '73def92a987efa98b987da' matches for user 'bob bobson' - you look at your rainbow table and see that entry corresponds to 'letmein', so you cracked bob's password. Actually, you wouldn't have just cracked bob's password - you would've cracked everyone that had that as their password, because hash('letmein') would've been the same for them all.

That's the thing - Rainbow Tables aren't targeted at a specific account. They a way of getting the lowest-hanging fruit. You might only crack 20% of the passwords with your table... but that means you cracked 20% of the accounts!. Why try to hack a specific account when you can quickly compromise thousands of the weakest-secured ones?

So what does (proper) salting do? It applies a value that's different for each account. Bob's password has a salt of '123' prefixed to it; Alice's has a salt of '468' prefixed to it. So even if they used the same password, their hashed entry wouldn't be the same - and the rainbow table wouldn't help you out. Salting prevents the hacker from trying to hack everyone's account at the same time, and forces them to do things one record at a time.

(By the way, this is why you'll see security people screaming to Never Reuse A Salt. Because if, say, all the records use the same salt? Then the attacker can recompute the rainbox table with the fixed salt, and once again be able to attack everyones' accounts at the same time.)

Nitpick: what you've described isn't a rainbow table as such, just a pre-computed brute force attack. A rainbow table is a specific way of optimising that so that you don't explicitly need to store every hash you've calculated. The key points are correct, though. — IMSoP, Jun 23 '20 at 09:10

How does a 'rainbow table' hacker obtain password hashes in the first place?

3 Answers3

Linked