Oversalting Potentially Harmful?

Question

I find it hard to believe, and figure I just missed or forgot something obvious (years since I thought about this stuff), so maybe someone can point it out. But it seems like most of what I know about salts is they're kinda loose in guidelines and only really beg for uniqueness. But if you're not following some strict guidelines it seems you can attack them using rainbow tables. And a similar attack on an unsalted password would only be possible if you allowed passwords larger than your digest size as input.

Outline:

For simplicity's sake I'm making some assumptions, they are quite a few (but I feel semi-reasonable):

The salt is random and unique and of static length.
And let's assume our hash function is uniform random (which theoretically a good hash should be), ie if I hash a counter I'll get a statistically good PRNG.
The salt is applied by concatenating with the password: hash(pass+salt)
The salt is there only to defeat a rainbow table
The entire point of using the hash is to store a check for the password, there is no encrypted or plaintext version of your password available to be checked. And the password is never used directly (as an encryption key for example).

Since the defender has no way of validating input we'll use a password of "0" (in the event it is pre-validated for password requirement correctness before sending, then use any input that passes the pre-validation, the only thing that matters in this step is that your password goes through and is constant) Compute a rainbow table for the hash(pass+salt) for "each value" of salt and a constant password. We now have a rainbow table that tells us for which salts a password of "0" would work. We've found collisions with real passwords by leveraging the salt.

On average given a uniform hash we would expect that singular password to randomly sample the space of possibilities for each salt computed.

Example:
pass+(no salt) = 1 hash found, 1 hash valid.
pass+(1 bit salt) = 2 hashes found, one for each bit state, each selected uniformly from "digest space".

If we consider the fact that any combined pass+salt string that is 1 bit longer than the digest size must have a 50% collision rate across all possible inputs. And we also consider the fact that we're sampling only a subset of all possible inputs by having static bits (our password), then a salt greater than the digest size by N bits will give 2^N collisions for that password. The password's length is discounted from collisions by being static.

In other words, if we assume we're computing a rainbow table for all possible salts and a static password, then a salt longer than the digest is bad by magnitude N as it weakens all passwords by that amount (although you have to do more work to build the table, the point stands). Less makes you more vulnerable to a more standard rainbow table application at the very least (with a 1 bit salt you only need 2 rainbow tables for example).

So the best case is a salt of exactly digest size. Which means on average we can expect to have at most 1 collision. Given that's 1 collision per fake password rainbow table that's well within reasonable security.

Some things that may increase the utility of this for the attacker that weren't fully considered:

Choosing a password that would pass any pre-validation check.
Doing this for something like the top 10 passwords, assuming they're not pre-validated out of consideration this probably guarantees you a decent percentage of hits.

The assumptions were to remove the following from consideration:

If one allows the salt to have different length then you would need a rainbow table for each length (which adds a little security, but I'm fairly certain it doesn't outpace the losses from the attacker being able to precompute that many more collisions). Possibly might be able to address that in the reduction function for the rainbow table and have chains consider different salt lengths.
Defender must be blind for this to work.
Some attack on the salt other than described (statistical, etc.), ie we could focus on a statistically likely hash region for a larger variety of static passwords.
Applying salt in another way essentially is a different hash function, you need rainbow tables for each hash function under consideration. Also concatenation is not associative (associativity could completely invalidate the salt).
The no direct password use bit is just covering my bases, the usage of a security system as a whole can potentially cover weaknesses of individual parts.

The Question:

IF the salt is not random and unique and of length exactly equal to the digest. THEN isn't there a feasible attack that can be made using rainbow tables? With a bigger problem the more you deviate from those guidelines? Or did I have an invalidating assumption or something? And if this checks out why have I never heard that the salt must be the size of the digest?

salt is also important so that using same password twice does not result in same hash. password storage hashes are more complicated than this, at minimum pbkdf2 and current recommendation is argon2. salt should be 128bits to ensure it's big enough that it's hopeless, it does not need to be the size of the hash output. password hash dumps are cracked using oclhashcat and dictionary attack, not using rainbow tables (a thing from the 90s nobody uses because everyone has salt). — Z.T., Mar 29 '20 at 21:25
Yep, I've got all that pinned down. Salts are preventing collisions normally with real pass. Here they're _enabling_ them with fake pass. And yeah if the anything is large enough then storing/computing a rainbow table becomes intractable regardless. And I'm using a simplified salt usage for analysis of the problem, I realize there's other ways to utilize a salt. General use afaik is basically akin to forcing a password of max length no matter user's actual pass length via Method X (whichever). I was getting at exceeding digest actually damaging in theory if you _did_ compute the rainbow table. — Black, Mar 29 '20 at 21:34
Does this answer your question? [Why are salted hashes more secure for password storage?](https://security.stackexchange.com/questions/51959/why-are-salted-hashes-more-secure-for-password-storage) — Royce Williams, Mar 29 '20 at 22:13
See also my answer here: https://security.stackexchange.com/questions/174552/why-do-salts-for-hashing-passwords-need-to-be-globally-unique-not-just-system-s — Royce Williams, Mar 29 '20 at 22:15
@RoyceWilliams I read those fully. Nothing new there, they're immaterial. Let me rephrase the question, imagine these scenarios: 1) Unsalted password of length digest+1, there is a +1 collision rate 2) Password of length 1+Salt of digest length, there is a +1 collision rate. You can apply a rainbow table *as you would in the unsalted case* by considering the salt to be the password to be cracked. This is a consequence of collision. I'm asking if that's incorrect. Because if it isn't then salts longer than the digest *allow* precomputation via collision. — Black, Mar 29 '20 at 22:35
... although possibly the point you were trying to make: a 128-bit salt is essentially enforcing a rainbow table to consider a 128-bit password. ***Is valid***. I was more getting at the fact that digest size is the limit you can hit in bit length before the salt starts impacting the password strength. (Which for usual cases digest is 128-bit+ anyway) — Black, Mar 29 '20 at 22:37

score 4 · Answer 1 · edited Jun 16 '20 at 09:49

4

tl/dr: Even if your arguments were correct, building a rainbow table for a single password for all possible salts is practically impossible. With a 128 bit salt and a 184 bit digest (bcrypt), building such a rainbow table for even a single password, and then storing it on high density Micro SD cards, would require a stack of Micro SD card weighing almost as much as the moon.

It sounds like you have a pretty good theoretical understanding of some important issues at play here, but I think you've missed some important practical points in your application of that.

1. Collision rate does not depend on input length

The collision rate depends only on the digest size and the number of guesses you attempt. Consider the following example:

A hypothetical hashing algorithm that has a 4 bit digest (aka output) size (aka 2^4 = 16 possible hash values).
A user has a password of unknown length.
Resources limit you to 8 unique guesses
There are 16 possible outputs and you guessed 8 inputs, therefore your chances of finding a collision with the password are slightly less than 50%
For the pedantic, your odds are actually less than 50% because some of your inputs may collide with eachother.

Most importantly, the above holds true regardless of input length. Whether the original password was 1 character long or an entire novel, there are still only 16 possible outputs. Whether your guesses are 3 bits long or each one is an entire novel, there are still only 16 possible outputs. That's the whole point after all! Therefore the only thing that affects your odds of finding a match are the digest size and the number of guesses you make (assuming a "dumb" brute force).

As a result we can see that adding a long salt does not make it easier to find a collision, even if the salt is longer than the digest size.

2. Brute forcing a salt is impossible

Practically though even that doesn't matter because brute forcing a salt is effectively impossible (for a good password hashing algorithm). Let's pick a practical example: the password_hash method built into PHP. By default it will use bcrypt and generate a 16 byte salt.

Sure, if you managed to make a rainbow table for the password password using all possible 128 bit salts, you would be able to instantly identify any uses of password inside a database dump. How much space would your rainbow table take up though? bcrypt has a 184 bit digest size and you have a 128 bit salt, so we can do the math 184 bits/salt * (2^128 salts) =~ 6e40 bits =~ 7e27 Tb. The world's storage capacity is roughly 300 exabytes which means that we only need ~3e19 times more storage capacity than available in the entire world in order to store our rainbow table.

It's hard to really picture big numbers though, so let's put that number in context. We're obviously going to be pressed for space so let's store our rainbow table on 128Tb Micro SD cards. How much will our rainbow table weigh? Well, if each Micro SD card has a mass of .25 grams then the total mass of Micro SD cards will be (7e27 Tb) / (128 Tb per card) * .25 grams per card = 1.5e25 grams. The moon has a mass of 7e25 grams, so your stack of MicroSD cards which stores your rainbow table, which is only good for cracking the password password, weighs almost as much as the moon!

In practice, you won't be building a rainbow table to attack any salted password anytime soon.

edited Jun 16 '20 at 09:49

Community

1

answered Mar 29 '20 at 23:27

Conor Mancone

29,899
13
91
96

**#2**, as far as that goes I believe I figured out a way to make Rainbow Tables O(2^K) lookup & Ω(P/(2^K)) space, where P is probability space size and K is a selected constant. Which is what prompted this question. Even that's not enough, and it's still infeasible because of pre-computation time being O(P), especially if P is Pass+Salt bits size. But if I made that jump maybe it's worth working harder, in addition, "What if P was weaker? Say just Salt bits?". Can I "remove" salts? If someone oversalts can I take advantage of that *possibly*? Just looking to drive precompute down... – Black Mar 30 '20 at 01:34
.... Which leads to: **#1**, I'm following that. That's actually what I was relying on. In your example imagine a null password under concatenation with the salt. If you have 32 possible salts that means under the null password there's a 50% chance that the null password matches any given hash assuming uniform distribution. That statement is true correct? Does it become false under a static prefix assumption? – Black Mar 30 '20 at 01:35
Hmmm, maybe I'm not following **#1** completely. Let me sit and digest this. I feel like you pointed out the flaw there but if you did it's not clicking. +1 for now, nicely thorough! – Black Mar 30 '20 at 01:39
Maybe I was following but the change in my mental scenario threw me off: I think the point of Rainbow Tables in relation to **#1** is that you compress a hash table using a space/time trade-off. And the hash table would be the guesses. And in a normal table you're making guesses equal to the number of entries. Does that follow? – Black Mar 30 '20 at 01:48
1

@Black, my impression is that you are mixing up several different concepts, which makes the reasoning difficult to follow and full of pitfalls. For example, you say that an empty password, when salted, will lead to lots of different possible hashes, increasing the probability of collisions. But those collisions aren't even relevant in user authentication, for example. `hash(" " + salt) = HASH123; hash("pass123" + other_salt) = HASH123` That's a collision, but that doesn't mean that you will be able to login to the second account with an empty string instead of "pass123" – reed Mar 30 '20 at 15:02
@reed You possibly may have hit the nail on the head. That's indeed what I believed to be true. Care to tell me why that isn't the case? Am I fundamentally misunderstanding a typical implementation detail that makes it so that the assumption "there is no encrypted or plaintext version of your password available to be checked." an unusual, invalid, incorrect, immaterial, etc.assumption with regards to your point? If you _only_ had the hash in the authentication server then you you could indeed login with the empty string? – Black Apr 04 '20 at 07:17
@Black, a hash in the database might look like this: `$2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a`. That's an example taken from the PHP documentation: 2y is the bcrypt version, 10 is the cost factor, then there is the salt (22 characters) and the rest is the actual hash. When you log in, the system hashes the password using that bcrypt version, that cost factor and that salt, and then checks if the hashes match. The same salt that is used to generate the hash the first time will also be used every time you need to check a password for the login. – reed Apr 05 '20 at 23:18
@Black, the result is that users with the same password will have different hashes in the database (because they have different salts). And users that by chance might end up having the same hash will not be able to log in using the same password (because they have different salts). Hopefully Conor Mancone will correct me if I'm wrong. – reed Apr 05 '20 at 23:23
@reed aaaah, I read your initial comment wrong, my apologies. You actually had a fundamental misunderstanding. My statement was this: `hash(" " + salt) = HASH123; hash("" + other_salt) = HASH123` therefore rather than do rainbow tables on passwords, fix the pass and table the salt. That should work. The further main question was if you had salt longer than hash would that weaken the `pass+salt`? – Black Apr 07 '20 at 10:15
@Black, sorry, I might not know enough about rainbow tables to understand your method, but to me it looks like it won't solve anything. One way or another, I suspect the complexity would be exactly the same, and as Conor Mancone said if you have good salts any computations will be infeasible. Also, the length of the salt has not impact on the hash, because the hash is usually a function that maps data of *any length* to a fixed-length string. – reed Apr 07 '20 at 13:59

Oversalting Potentially Harmful?

1 Answers1

1. Collision rate does not depend on input length

2. Brute forcing a salt is impossible