46

A system we're introducing into the organization is not using any traditional hashing function that I can find. I've been assigned to "approve" it by black-box testing.

I suspect passwords are simply being obfuscated or encrypted (not hashed) since the length of the result changes with password length (though there is some minimum, it seems - perhaps padding?).

Here is some data:

password   : result
1          : TG3432WB7VU=
11         : r8ahLkJGWbY=
111        : E7fXdNqEWAA=
11111111   : FIcx3a00R4eGhFyHjD56qw==
1111111111 : FIcx3a00R4evxqEuQkZZtg==
2111111111 : GPwnH80qEACvxqEuQkZZtg==

The result is obviously base64, but doesn't decode to anything readable. Before I knew that the result's length changed, I tried taking the decoded bytes and treating them as MD5 and other hashing functions, but that obviously didn't pan out. Now that I know the result changes in length, I'm considering other, worse, alternatives. Of note are the two bold parts above: they're identical in two different passwords. So either each 8 bytes are being processed independently (why?) or there's some polyalphabetic substitution going on(?).

Update:

Any character, including Unicode characters, is accepted by the system for passwords. Repeating a 3-byte character 8 times does result in a 24 byte long "hash".

psmears
  • 900
  • 7
  • 9
coldfused
  • 563
  • 4
  • 6
  • 12
    The lengths of the not-the-=-character parts and the bold part suggest 64-bit blocks, and the repeat of the bold part suggests [ECB mode](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_Codebook_.28ECB.29). ​ Thus, I'm guessing [DES](https://en.wikipedia.org/wiki/Data_Encryption_Standard)-ECB. ​ ​ ​ ​ –  Feb 24 '16 at 20:02
  • 43
    For a black-box test, I would just present your findings. It's difficult to reverse-engineer hashing (or "hashing" in this case), especially with only the 5 data points you have given us. But you already know that: 1) Resulting Hash length is (at least partly) based on input length and 2) Similar input leads to similar output. None of that should happen with a proper hashing function, so in its current form, it shouldn't be approved (doesn't matter how it actually works in the end). – tim Feb 24 '16 at 20:06
  • 3
    Another interesting point to test would be 2111111111 and see if the second part of the result remains the same. It appears to be a weak algorithm to begin with, but if the second part doesn't change, this is definitely a weak algorithm. – Jonathan Feb 24 '16 at 20:10
  • 55
    Being asked to "approve" something through black-box testing is concerning in itself. Explain that crypto should be secure even when the attacker fully knows the crpyto used. – d1str0 Feb 24 '16 at 20:12
  • 3
    AKA. https://en.wikipedia.org/wiki/Kerckhoffs's_principle – d1str0 Feb 24 '16 at 20:12
  • 1
    That's looking a lot like it handles each 8 input {bytes?/characters?} independently. ​ Does it accept [non-7-bit](https://en.wikipedia.org/wiki/ASCII#7-bit) ascii? ​ ​ ​ ​ –  Feb 24 '16 at 20:25
  • 14
    Was the programmer who wrote that scheme called [Dave](https://security.stackexchange.com/questions/25585), by any chance? – Deer Hunter Feb 24 '16 at 23:33
  • 3
    The next thing to check is that it handles the 8-byte blocks identically. ​ What are the "hashes" of aaaaaaaabbbbbbbbc and bbbbbbbbaaaaaaaac? ​ ​ ​ ​ –  Feb 25 '16 at 06:44
  • 1
    @RickyDemer we can already see the second block for 11111111 11 is the same as the only block for 11 (hex afc6a12e424659b6) so *some* 64-bit block in ECB -- but 64-bit block could easily be TDES, or IDEA or CAST5 or RC5 or Blowfish, or even something weirder like Kasumi or (saints preserve us) Skipjack. – dave_thompson_085 Feb 25 '16 at 08:36
  • 12
    Tell them that the standard method for "approving" cryptographic algorithms is to let the entire crypto community try to crack it for a few years and then ongoing research looking for weaknesses even after that. Ask them if they are prepared to make that investment on their home grown algorithm, and then tell them that investment has *already* been made in the standard algorithms. Do they really think they can do better than that? – jpmc26 Feb 25 '16 at 09:56
  • @Ricky Demer, This is a very good point, if it does handle the blocks identically, then the algorithm is even weaker, as Neil Smithline comments to my answer. – Jonathan Feb 25 '16 at 14:02
  • 1
    It's not [LM hash](https://en.wikipedia.org/wiki/LM_hash) but it looks like exactly the same analysis applies. – Ben Voigt Feb 25 '16 at 23:58
  • 1
    You *absolutely* want to point them to [xkcd: Encryptic](https://xkcd.com/1286/). – user Feb 26 '16 at 08:33
  • 1
    The mere fact that you can not only make out single-letter patterns but also common subsequences of 6-8 letters **with the naked eye** suggests that whatever hash this is, it is _piss poor_ by cryptographic standards. As in, e.g.: zero avalanche. If you can already make out patterns with the naked eye on half a dozen samples, there is no question about what would happen if you did a thorough analysis/attack. No way you could approve this. – Damon Feb 26 '16 at 10:31

4 Answers4

58

I highly suspect this is a self rolled, or at least very outdated method. It is very weak by modern standards and should not be used to protect anything important.

Short passwords could be cracked with no more than 2^64 brute force attempts, which is quite possible with even a normal computer. If both halves of the result are independent, even with fairly long passwords, 2*2^64 brute force attempts could crack it (so 2^65 attempts). There are likely further weaknesses in the algorithm that make it weaker than described here.

Another interesting point to test would be 2111111111 and see if the second part of the result remains the same. If the second part doesn't change, this is definitely a weak algorithm.

Edit:

Seeing that the results of the 2111111111 test are the same for the second half, this is definitely a weak algorithm and should not be used to protect anything sensitive! I have included the relevant comparison below:

  • 1111111111 : FIcx3a00R4evxqEuQkZZtg==
  • 2111111111 : GPwnH80qEACvxqEuQkZZtg==

Edit:

Another interesting test would be what Ricky Demer suggests below:

The next thing to check is that it handles the 8-byte blocks identically. ​ What are the "hashes" of aaaaaaaabbbbbbbbc and bbbbbbbbaaaaaaaac? ​ ​ ​ ​ – Ricky Demer Feb 25 at 6:44

If it handled the 8-byte blocks identically, it is even worse for even long passwords, no matter what the length. It would require just 2^64 brute force attempts as pointed out by Neil in the comments. Once someone made a table of what each possibility calculates to, this algorithm would be practically useless. There are probably even more weaknesses to this algorithm remaining to be discovered.

Bottom line:

I do not recommend using this algorithm regardless of the results of the last test. We have already seen enough to know it is weak. Something with a higher bit encryption / hash scheme, salt, long computation time etc... would be much better, and I would recommend going with one of the existing thoroughly tested methods. Self rolled encryption / hashing has not had the benefit of extensive testing that most of the main stream methods have.

Jonathan
  • 3,157
  • 4
  • 26
  • 42
  • 3
    I think `2*2^64` is wrong as you could search for each 64-bit section of the hash at the same time. So cracking any hash is never more than 2^64. At least I think. – Neil Smithline Feb 24 '16 at 22:28
  • 29
    This is the exact same thing Adobe did with their passwords. They ended up [creating the world's greatest crossword puzzle.](https://xkcd.com/1286/) –  Feb 24 '16 at 23:48
  • @NeilSmithline Well by that reasoning, it's also 2^32 if you have 2*2^32 processors, since then you can test 2*2^32 inputs at the same time. – user253751 Feb 25 '16 at 00:08
  • 9
    @immibis - that's not what I was referring to. I'm not talking about adding processors. That doesn't affect the number of computations that occur. By "parallel" I meant that being that the hash is calculated in 8-byte chunks, if the hash is for more than 8 bytes, you can run through all 2^64 possibilities and compare each result to both the first 8-byte result and the second. So you never need to do more than 2^64 calculations. – Neil Smithline Feb 25 '16 at 00:18
  • @Neil Smithline, if Ricky Demer's test reveals that the bytes are encrypted / hashed identically regardless of position (which is quite possible), you are correct. In this case, the algorithm is even weaker than I described, which I did allude to in the question. – Jonathan Feb 25 '16 at 14:01
10

The fact that the length of encoded strings depends on the length of respective passwords suggests a much worse weakness: passwords seem to be obfuscated rather than hashed.

This means than anyone who knows the internals of the system may be able to instantly decrypt any passowrd! The implications are numerous: you have to trust developers who ever worked on the system, the source code has to be kept secret forever (that's why you're reviewing a black box) etc.

It doesn't really matter how good the obfuscation algorithm is (and 264 isn't exactly stellar). It's broken by design.

Dmitry Grigoryev
  • 10,072
  • 1
  • 26
  • 56
  • It is not a given that this is the case. If for example the algorithm was to DES encrypt a fixed plaintext using the first 7 or 8 characters of the password as key, then it would not be possible to decrypt it instantly. It would still be very weak though. – kasperd Feb 25 '16 at 13:32
  • @kasperd the length of the encoded text would be independent of the password length then. There is no valid reason to have encoded text length depend on password length, if you have no intention to recover plain-text password from it. Black-box testing of a cryptographic function is also a strong indicator it's "security through obscurity". – Dmitry Grigoryev Feb 25 '16 at 13:45
  • My previous comment was a bit imprecise. What I meant was that the password might be split into chunks of either 7 or 8 characters, and each chunk is used as a DES key encrypting the same constant once per chunk. – kasperd Feb 25 '16 at 13:59
5

Kerckhoffs's_principle obviously applies: A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

This in itself should be enough for you to say "I cannot approve this system without more information."

From your examples, it seems that the input is divided into blocks of 8 characters/bytes, each block is hashed individually and the outputs put together, then the entire result is base64 encoded.

This is not good. Most passwords that are more than 8 character are just a few characters longer. This means an attacker can crack the second block very very easily. And when you know the second block, you have a strong hint as to what the first block can contain.

The crypto community discovered this decades ago and will no longer hash blocks independently. That means these programmers are NOT up to date on crypto.

The only good news is that a single-character change in the password changes its entire output block. This probably means that they are in fact hashing the passwords and not just obfuscating them. Still, with the above weaknesses the worlds best hash function cannot save the algorithm.

Stig Hemmer
  • 2,403
  • 10
  • 14
2

Seems like a simple block cipher, and is definitely not something you would ever want to use. Compare it to, instead of hashing passwords, simply encoding/obfuscating them with, for example, base64. This is not something you would ever want to do. If anyone ever even applies simple cryptanalysis, they could easily find out that their are only enough characters used to be base64.