All hash algos shatter... so why not switch from golden bullet to scatter shot?

Question

As of this morning any SHA-1 hash can officially be collided for as little as $110k in GPU power on Amazon.

https://arstechnica.com/security/2017/02/at-deaths-door-for-years-widely-used-sha1-function-is-now-dead/

Obviously this means that for many applications SHA-1 is completely useless and that within the next few years it will have gone the way of MD5.

Question:

It seems that over time we decide that the best way to make things more secure is just to come up with algorithms that need more and more bit entropy and more and more CPU processing power which seems like a false solution.

If 128-bits of entropy is enough entropy (i.e. AES-128 is preferable to AES-256), why don't we just come up with algorithms that can handle 128-bits of entropy better?

Furthermore, instead of moving to algorithms in which flaws will also eventually be found that increase the CPU, memory, and storage requirements so drastically (there's a HUGE difference between MD5 and SHA-256 when you're computing on a phone or a raspberry pi), why don't we just pair two imperfect algorithms together?

For example: couldn't we run 512 bits through the MD5 algorithm and then run those same 512 bits through a SHA-1 and do a simple mux on the combined result and even though both algorithms have known attacks, the coincidence of an attacker finding any set of bits that aligns in the tables of both algorithms seems that it would be quite secure. Or simply digesting the MD5 in reverse every block and scattering some of the previous source bits every n blocks - you could read a block in forwards, reverse the bits, run it through backwards, and then continue to the next block and after 16 blocks take the nth bit of the original source blocks, maybe even have a rolling window on it - holding less than 1k in memory at any given time.

It seems that we're trying to find this single ultimate mathematical solution to entropy to something that could be solved much simpler with two or three simpler mathematical solutions that are out-of-band from each other and this would be faster, more memory efficient, more storage efficient, and more secure than any single algorithm.

Is there a flaw in my reasoning?

This is NOT A DUPLICATE

The other question is talking about taking the final result of a hash in series with another (weakest link):

f(g(x)) => y

My question is about using multiple hashes simultaneously on each block to avoid weaknesses (mutual exclusion of vulnerabilities)

for n of x:
    f(x[n]) \
             (x[n]')XOR(x[n]'') => y
    g(x[n]) /

Because the question is different, the answers are also different.

Combining many bad things does not create a good thing. Try to build a car based on this idea. — Steffen Ullrich, Feb 23 '17 at 20:53
We do. We have hybrid cars. It mostly uses electric, but when the electric fails the gas is the failsafe. It seems like we could do the same with these algos. I'd love to hear what a security researcher says, but I doubt that MD5 and SHA-1 are both vulnerable to a given byte sequence. — coolaj86, Feb 23 '17 at 22:58
Doing MD5 + SHA-1 is probably slower. You're processing the input twice. Also, they are optimised for 32-bit CPUs, while SHA-512 is optimised for 64-bit. — paj28, Feb 23 '17 at 23:16
The 110k in GPU cost is only for the second phase of the attack. The first phase cost an estimated 560k in CPU time as well. — Xander, Feb 23 '17 at 23:17
To answer your question (while it's locked) the answer is "Someone already has figured out the scattershot approach." Look at the design of the Skein cryptographic primitive developed by Schneier et al. It uses a very simple primitive mix function, iterated through 76 rounds. What they developed that was novel is the observation that increasing the number of rounds is more critical to security than the complexity of each round. This is similar to what you're proposing, but it's been better thought out and tested. Unfortunately, skein didn't win the NIST contest. — John Deters, Feb 24 '17 at 19:58

score 2 · Answer 1 · edited Mar 17 '17 at 13:14

128-bits should be enough entropy - see this answer. Quantum computers may change this, but that's still theoretical.

It appears to be (relatively) easy enough to design a symmetric cipher where the only effective attack is brute force. AES has withstood some years of cryptanalysis and there are many other ciphers.

Asymmetric crypto generally needs much larger keys for equivalent security.

For hashes, and specifically for collision resistance, we need to worry about the birthday attack. In rough terms, than means a hash needs to be twice as long as the equivalent symmetric cipher. So SHA-256 is a good match for AES-128.

I should point out that both MD5 and SHA-1 had cryptographic breaks - the attacks are better than brute force. SHA-1 is a 160-bit hash, so brute force (with birthday attack) would require 2^80 operations, but the published attack used more like 2^64. It may turn out that SHA-256 has cryptographic weaknesses, but they would have to be significantly better than brute force to make an attack practical.

peevishprogrammer · Accepted Answer · 2017-02-24T14:46:03.277

First, you're making assumptions about the properties of XORing two hash digests. Unless you have fully studied AND analyzed all of the statistical implications of this output, you are potentially creating an extremely weak cipher. Lesson number one of cryptography is as follows: DO NOT use home made ciphers. Unless you are a very seasoned cryptographer, you will not understand all of the implications of your actions.

We use publicly studied and verified functions such as the sha2 and sha3 families because we know their properties! They were created by experts and analyzed by the best mathematicians we have. We have created them such that they are currently not computable within a reasonable time frame. That is, as long as these isn't some mathematical 'trick' discovered to make our hash easily computed: we can simply increase bit size of our key as soon as the hash approaches computability in a reasonable time frame.

Let's take a moment and remember what a bit is. It's either a '1' or a '0'. This means we have 2 possible states. Let's look at a very simplistic example: Say we have a hash function with a keysize of 3. We have 2^3 = 8 possible states now. Keysize of 4: 2^4 = 16 states. Keysize of 5: 2^5 = 32 states. Notice a pattern? Each time we increase the key size by 1 bit, we are doubling how many states we potentially have. When we get to bigger keysizes, you can see the extremely simple power of increasing the keysize. It doubles the space an attacker has to brute force to find a collision! This is quite the growth for an attack to have to compute!(Think of real keysizes such as 2^128 or 2^256, these are huge numbers) However, sha2 and sha3 are very fast hashing algorithms. So for a user compute it only once, the small increase in computation and storage space is absolutely dwarfed by the extra work an attacker must now perform! (If you're hashing passwords, you MUST use a slow hashing function such as bcrypt)

How much 'extra storage' are we really talking? Let's go to an extreme example: asymmetric cryptography(also called public-key crypto) as it requires much bigger keys than symmetric cryptography. (NOTE: The entropy of this function is very different and not comparable by simply looking at the bit sizes but for storage purposes this will work just fine) The largest RSA key broken was 768 bits with 1024 theorized as being vulnerable. So, let's compare a 1024 bit key and a 4096 bit key. A 1024 bit key is 128 bytes or 0.128 kb. A 4096 bit key is 512 bytes or 0.512 kb. The cost of storing those is so laughable due to current hard drive costs that you are worrying about nothing. Let's prove it. According to forbes in 2016, the cost per GB was $0.033. That means the cost to store a KB is $0.000000033 / KB or $0.000000016896 to store our 4096 bit RSA key. (NOTE: I'm using 1000 bits/kilobyte as per most hardware manufactures.) As for the extra computational steps, the only place this will matter is embedded hardware or extremely small systems. If the information you are transferring needs confidentiality, then the extra time will be worth it. There is also hardware designed specifically for cryptography so I don't view this as an issue in real applications of crypto.

NOTE: Due to the birthday theorem, the realistic amount of brute forcing an attacker would have to do would only be HALF of the keysize. So for a 128 bit key or 2^128 keyspace, an attacker realistically only needs to brute force 2^64 hashes to find a collision with 99.7% probability.

As for your idea to shuffle the input by reading blocks separated, forwards, backwards, etc that is similar to what the SHA hash fuctions are already doing! They are creating permutations and substitutions among the input data to end up with the output hash digest. They use two tools to do this: s-box's(substitutions) and p-box's(permutations). They are slightly more complex than your example but your idea for shuffling the input bits & mixing them is what makes real hash functions work!

I'm you're interested in learning more about this stuff, I suggest you read this free & wonderful textbook: Security Engineering by Ross Anderson

All hash algos shatter... so why not switch from golden bullet to scatter shot?

2 Answers2