What exactly is this hash that Jack the Ripper and Hashcat use (and also have tools for extracting?)

Question

These tools work based on "hashes", but they seem to not mean that in traditional sense, since they can also extract hashes from a variety of formats like kdbx and zip.

I doubt all these formats also keep a hash of the password inside themselves that the tool extracts and then tries to crack. So like what does hash mean here?

Since `keepass2john` and `zip2john` both output hashes, why do you doubt that the hash exists in the source document? — gowenfawr, Jul 05 '21 at 19:36
Because why would you store the hash of the password in the file itself and for all of the available formats too? I haven't seen anything mentioned in KeePass documentation that the hash of the password is kept in the database (and in the first place, the hash is used as the key for the encryption algorithm, storing it would be like storing your password alongside the encrypted data). — Hormoz, Jul 05 '21 at 19:40
"I don't believe it could be a hash, therefore it is not a hash, therefore what is it?" -- that's a poor string of logic there. It's a hash. And since the question, then, is confirming what kdbx and zip archives use, then this really isn't about hashcat, or john, This ends up not being a security question at all. — schroeder, Jul 05 '21 at 19:44
I am not saying "I don't believe it is a hash therefore it is not a hash." I am saying "It doesn't make sense that all those formats would be keeping password hashes inside them." Also I am not assuming anything, that's why I asked. Do you have a source that for example KeePass keeps a hash of the password itself in its database? Also questions related to cracking passwords and methods used are indeed security questions as far as I am aware at least. — Hormoz, Jul 05 '21 at 19:51
@gowenfawr The password hash is most definitely not stored in a password-encrypted file, since the password hash is the data encryption key (or at least is enough to derive the data encryption key). — Gilles 'SO- stop being evil', Jul 05 '21 at 19:59
That's a slightly different question you've posed there. They are not "password hashes". They are key derivision hashes. Once you ask the question "how does zip hash the password?" then you will find the better results. https://security.stackexchange.com/questions/199545/how-does-a-zip-file-detect-a-correct-password — schroeder, Jul 05 '21 at 19:59
@Gilles'SO-stopbeingevil' it's a hash, just not the *password hash* — schroeder, Jul 05 '21 at 20:00
@schroeder I don't understand how the way password-based key derivation works is off-topic. — Gilles 'SO- stop being evil', Jul 05 '21 at 20:00
@Gilles'SO-stopbeingevil' because the question is based on a faulty premise: "they can't be hashes". It puts the whole thing in a loop: they are hashes, I've rejected the notion they could be hashes, but they demonstratively are, ... The only way to answer the question is to ignore the premise of the question. — schroeder, Jul 05 '21 at 20:04
@schroeder I guess you understand the question in a different way than I do, but I can't figure out what you think the question means. As far as I can tell (as you can tell from my answer), the question stems from not realizing that password-based hashing and password-based key derivation are very closely related. Finding the password of a password-encrypted file does not involve inverting a hash (there is, as the asker suspects, no hash stored in the encrypted file), but it involves inverting a key stretching function, which is essentially the same thing. — Gilles 'SO- stop being evil', Jul 05 '21 at 20:08

score 2 · Accepted Answer · answered Jul 05 '21 at 19:57

Password authentication uses a pair of functions:

Set password: takes a password as input, outputs a password hash and a salt. This is a randomized function: calling it twice on the same password returns different outputs. S(p) = h + s
Check password: takes a password and a password hash plus salt as input, outputs “true” if the password hash is one that “set password” could have produced from the password and “false” otherwise. C(p, h) = b

Under the hood, these operations rely on a password-based hashing function F, which is a deterministic operation that takes a password and a salt as inputs and outputs a password hash. “Set password” generates a random salt and calls the password-based hashing function, and stores the salt together with the output: S(p) = F(p, s) + s where s is the randomly generated salt. The password hash contains the salt so that “check password” can call the password-based hashing function with it: C(p, h + s) = compare(F(p, s), h). (Note that I'm just describing the essence of what these operations do; the details depend on the algorithms and storage formats.)

The qualities of a good password-based hashing function are:

Given a hash value (including the salt), there must be no way to find a matching password except trying them one by one (brute force).
Brute force attempts must be slow.

For more information about password-based hashing functions, see How to securely hash passwords?.

Password-based encryption uses a pair of functions to turn a password into an encryption key:

Prepare encryption: takes a password as input, outputs a key and a salt. This is a randomized function: calling it twice on the same password returns different outputs. The salt is stored with the encrypted data. The key is only loaded into memory, and wiped once the encryption is done. E(p) = k + s
Prepare decryption: takes a password and a salt as input, outputs a key. D(p, s) = k

These two operations are very similar. “Prepare encryption” just generates a random salt, then does the same calculation as “prepare decryption”.

The qualities of a good password-based encryption function are (simplified):

Different passwords must produce different keys, and knowing the key produced by some passwords must not help in finding the key for another password.
Given a salt and possibly some encrypted data and the corresponding plaintext, there must be no way to find the key except trying either keys or passwords one by one (brute force).
Brute force attempts based on the password must be slow.

This is very similar with the qualities of password-based hashing. It's possible to construct good password-based encryption functions that fail at password hashing or vice versa, but in practice these tend to be somewhat contrived examples. (For example, appending helloworld to a good password hash does not weaken it, but doing that for encryption would be bad since it would be trivial to predict that the key ends in helloworld.) As a consequence, password authentication and password-based encryption are built into the same kind of cryptographic primitive, which is called a password-based key derivation function or key strecthing function. In the notation I used above, F and D are both key stretching functions.

When you point a tool like hashcat at a user database, it reads the salt and the hash value from the database (including the salt s) and attempts to find a matching password by calculating F(p, s) for each password p and comparing the result with the h value stored in the database. When you point it at a zip file, it reads the salt s from the encrypted file, calculates F(p, s) for each password p and tries decrypting the file with the resulting key. If the result “looks good”, the decryption key is the correct one and therefore password is a match. The definition of “looks good” depends on the data format: some contain an explicit key check value, while for others the tool has to rely on heuristics (for example, magic values at the beginning of certain file types).

I see. Thanks for answering! Though, again, according to [this](https://security.stackexchange.com/questions/167927/why-do-keepass-dabases-contain-a-hash-of-their-master-password) the only hash stored in a keepass database is the hash of the plaintext of the database itself. There is a comment in that thread that indicates the database itself is included in the output of keePass2John and then I guess Hashcat can try to crack that but then it wouldn't be a hash anymore and Hashcat is for cracking hashes, so it's weird. But aside from that, I see how it works. — Hormoz, Jul 05 '21 at 20:29

What exactly is this hash that Jack the Ripper and Hashcat use (and also have tools for extracting?)

1 Answers1