4

I know this is a real dumb question and I am certainly talking complete rubbish, but let me explain:

  1. We have a long SHA-256 hash, e.g.: 474bfe428885057c38088d585c5b019c74cfce74bbacd94a7b4ffbd96ace0675 (256bit)
  2. Now we use the long hash to create one SHA-1 and one MD5 hash and combine them. E.g.: c49e2143913627ea178e66571189628a41526bf3 (SHA-1; 160bit) + 6bb225025371aaa811748da8e011773d (MD5; 128bit)

So now we got this: c49e2143913627ea178e66571189628a41526bf36bb225025371aaa811748da8e011773d
(SHA-1 + MD5; 160bit + 128bit = 288bit)

And it is longer than the input string, with 288bit instead of 256bit. So did we actually increased the entropy?

In short this is this:

hash = sha256(input) # 256bit
result = sha1(hash) + md5(hash) # 288bit

(That is supposed to be pseudo-code, don't know if it is valid through.)

What is the error in reasoning here? I am sure you cannot increase entropy/string length in this way...

Edit: Also important: Did I possibly even decreased the entropy this or stayed it the same?

rugk
  • 1,237
  • 1
  • 13
  • 25
  • 7
    Why are you doing this? – 700 Software Jul 21 '16 at 12:33
  • @GeorgeBailey Excellent question. :-) Let's say basically I want to stretch the SHA-256 hash a bit and I want to avoid loosing entropy. – rugk Jul 21 '16 at 13:22
  • 2
    Possibly more then you were bargaining for, but you might be interested in [this paper about hash function combiners](https://tuprints.ulb.tu-darmstadt.de/2094/1/thesis.lehmann.pdf). Note hashes are expected to fulfill conditions of preimage/second preimage/collision resistance, and concatenation doesn't necessarily preserve/provide these properties. So it's possible that combining hash functions that way will produce a construct that doesn't fit the requirements of a cryptographic hash function anymore. – Ella Rose Jul 21 '16 at 17:35
  • @AndréBorie Indeed similar question, but not a dupe. In any case it should be read in addition to this one. – rugk Jul 22 '16 at 15:30

4 Answers4

24

And it is longer than the input string, with 288bit instead of 256bit. So did we actually increased the entropy?

No, you did not increase the entropy.

In this context, "entropy" basically refers to the probability of any particular guess about the content or value being correct.

If I tell you that I have hashed a single lowercase US English letter's ASCII representation using SHA256, and that the hash is hexadecimal ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb, then the entropy of that value isn't 256 bits, but rather closer to five bits (log2(26) ~ 4.7) because you only need to make at most 26 guesses to arrive at the conclusion that the letter I hashed was a. (For completeness, what I really did was printf 'a' | sha256sum -b on a UTF-8 native system.)

The entropy, thus, can never be greater than that of the input. And the input is, at best, the initial hash, which has 256 bits worth of entropy. (It could have less, if the string you hashed has less than 256 bits of entropy and the attacker can guess that and its value somehow.) Each hash calculation can be assumed to be O(1) when the input size is fixed.

So by concatenating the SHA-1 and MD5 hashes of a string that is a SHA-256 hash, you can never get more entropy than 256 bits' worth. All you are doing is making it longer, and possibly obscuring its origin.

Now, in some situations, using multiple hashes actually makes sense. For example, many Linux package managers use and validate multiple, different hashes for the same file. This isn't done to increase the entropy of the hash value, though; it's done to make finding a valid collision harder, because in such a situation the preimage or collision attack must work equally against all hashes used. Such an attack against a modern cryptographic hash is already difficult enough for most purposes; mounting a similar attack against several different hashes simultaneously would be orders of magnitude more difficult.

user
  • 7,670
  • 2
  • 30
  • 54
  • Awesome of you to provide a real-life example! – rugk Jul 21 '16 at 13:12
  • @rugk I'm glad you found it useful. – user Jul 21 '16 at 14:18
  • 2
    In the first case the 256 bit hash was rehashed. In the second case, the entire data source was hashed. That's why the second method has more entropy than the first. (This is of course assuming the data source itself had more than 256 bits of entropy to start with.) – Paul Draper Jul 22 '16 at 04:43
3

You are certainly not adding entropy. You still only will have at most 256bit entropy possible outputs of this schema, no matter how many times and how you rehash this. Note that you will have at most 256 bit entropy, because you did not told us about how much entropy is in your input. SHa256 will also not give you 256bit entropy if you have less than that in your input. Hash nevers increases entropy.

But considering the full sha256 possible values, you actually would loose entropy because of collisions. Some of the 256 bit values will have collisions, that is, for 256bit input to sha1 and md5 there will be x1 and x2 that sha1(sha256(x1)) == sha1(sha256(x2)) and md5(sha256(y1)) == md5(sha256(y2)). You lost entropy because of that if you now have actually less outputs.

CristianTM
  • 2,532
  • 15
  • 20
3

As others have explained, 'bits of entropy' refers to to the guess-ability of the original Password or other text that was first used to create the SHA-256 hash. In your example case the entropy is unchanged.

What you've done here is provide an SHA-1 version and an MD5 version of the SHA-256. This makes the SHA-256 more guessable than other solutions you could be using.

If you are trying to make it harder to guess the original input, you should simply repeat the SHA-256 function. In fact, repeating SHA-256 many thousands of times will make it much harder to brute-force the original input, with no noticeable impact on your server load or end-user experience.

However, for password storage, the best thing is to use a proper Password Hash function, which is designed to be a Slow Hash with Salt. This lets you adjust how many rounds (the work factor) to fit the desired processing time on your production hardware. BCrypt is usually recommended. PBKDF2 or the new SCrypt are both good choices.

700 Software
  • 13,807
  • 3
  • 52
  • 82
  • "with no noticeable impact on your server load", but maybe on my servers CPU. ;-) Additionally I did not said this runs on a server. (Spoiler: However it does. ;) ) – rugk Jul 21 '16 at 13:15
  • Good idea to think of password storage, but I can put your mind at ease: The input here is no password, but still something which should probably not be guessed. – rugk Jul 21 '16 at 13:16
  • Passwords usually have a lot less entropy requiring a significant work factor to effectively secure them. (and even then, some cannot be secured) Since this is not a password, I wonder what it is. ☺ If the input data has 72 bits of entropy, then no repetition is necessary to secure it well. An example input with 72 bits of entropy would be 9 random bytes, represented as 12 Base64 characters. – 700 Software Jul 21 '16 at 13:59
1

did you increase the entropy... most likely not.

All you did is use 2 older hashing functions to get a new hash. since this has no new data, entropy is not affected.

The amount of bits here makes no differences whatsoever (since its just 'another way o writing' the original hash.)

Entropy (in cryptography) has to do with the amount of uncertainty a specific bit is '0' or '1'. In order to increase this you need 'new' data, not simply rehash things (to a lower degree). You can keep the smae level of entropy most of the time when you rehash using the original hashing function. since then you do not lose bits.

LvB
  • 8,217
  • 1
  • 26
  • 43
  • 1
    Since a hash must contain less information then the plaintext, and this is 2 hashes of the same "plaintext" then surely it has less entropy than the original? – symcbean Jul 21 '16 at 11:02
  • can be but depends on the actual hashing algorithm. In this case where a sha256 is fed into sha1 and md5 yes you lose entropy – LvB Jul 21 '16 at 11:20
  • So although the resulting string is longer you always (?) loose entropy in this conversion? Might it not at least stay the same? – rugk Jul 21 '16 at 13:07
  • Assuming a longer string than the Sha512 was used to calculate the sha256, you always lose entropy (you have less bits in the pool to guess to get this specific state). unless I am misunderstanding the hashing algorithms I believe they will yield a lower amount of entropy bits. – LvB Jul 21 '16 at 13:15