Why does node.js scrypt function use HMAC this way?

Question

According to the documentation, the scrypt hash function works like so:

The hash function does the following:

Adds random salt.

Creates a HMAC to protect against active attack.

Uses the scrypt key derivation function to derive a hash for a key.

Hash Format

All hashes start with the word "scrypt". Next comes the scrypt parameters used in the key derivation function, followed by random salt. Finally, a 256 bit HMAC of previous content is appended, with the key for the HMAC being produced by the scrypt key derivation function. The result is a 768 bit (96 byte) output:

bytes 0-5: The word "scrypt"

bytes 6-15: Scrypt parameters N, r, and p

bytes 16-47: 32 bits of random salt

bytes 48-63: A 16 bit checksum

bytes 64-95: A 32 bit HMAC of bytes 0 to 63 using a key produced by the scrypt key derivation function.

Bytes 0 to 63 are left in plaintext. This is necessary as these bytes contain metadata needed for verifying the hash. This information not being encrypted does not mean that security is weakened. What is essential in terms of security is hash integrity (meaning that no part of the hashed output can be changed) and that the original password cannot be determined from the hashed output (this is why you are using scrypt - because it does this in a good way). Bytes 64 to 95 is where all this happens.

My question is why does it use the scrypt hash as a key for the HMAC algorithm rather than just returning the scrypt hash directly? What extra protection does this give? It mentions "active attacks" but doesn't give details.

score 19 · Accepted Answer · edited Sep 07 '15 at 20:36

I created the Node Scrypt module.

HMAC adds additional security. Using it also lends the scheme to be used as a header in an encrypted file format (like it is done in tarsnap) and not just in an authentication server's database. Also, Colin Percival (who created scrypt) uses this scheme to verify (I actually just copied it from him).

To explain why HMAC is used, lets have a quick recap. When encrypting something using the scrypt key derivation function, a 96 byte result is produced with the following break down:

 bytes 0-5: The word "scrypt"
 byte 6: 0
 byte 7: logN
 bytes 8-11: r
 bytes 12-15: p
 bytes 16-47: salt (which is 32 bytes)
 bytes 48-63: A 16 byte SHA256 checksum (hash) of the contents of bytes 0 to 47
 bytes 64-95: A 32 byte HMAC hash of bytes 0 to 63 with the key being the scrypt cryptographic hash

It is vital that bytes 0 to 47 be in plaintext (not altered or encrypted in any way). To ensure this, there is a 16 byte SHA256 checksum. Now while SHA can be used quite effectively as a checksum (specially in this case), it cannot guard against an active attack, meaning that someone has got hold of the payload, substituted their own values. For example, I could get hold of the payload, calculate my own logN, r and p as well as my own checksum and then pass that off as the original.

To guard against this happening, the final 32 bytes is HMAC. HMAC is used to ensure message integrity (i.e. guards against anyone actively changing a payload) and is a workhorse of the cryptographic arsenal (read: it is safe and secure to use). HMAC requires a key, and we use the scrypt hash as the key.

If the final 32 bytes were just a scrypt hash, then nothing would stop an active attacker from being able to compromise everything and substitute their own scrypt hash. The HMAC protects against. It not only serves as a means to verify the scrypt hash, but it also checks the integrity of the entire scheme.

BTW: People may be wondering why the checksum (bytes 48 to 63) is required. Well, if you think about it, we need to calculate the scrypt hash so it can be used as a key for the HMAC. So the checksum adds an additional level of checking: If it does not pan out, then the verification immediately returns false without going any further.

If an active attacker can change the payload at will, why can’t they recalculate and replace the HMAC too? — Justin Megawarne, Feb 10 '17 at 12:04

score 2 · Answer 2 · answered May 07 '15 at 11:33

2

It is explained later in the documentation:

If your interested in this module is to produce hashes to store passwords, then I strongly encourage you to use the hash function. The key derivation function does not produce any message authentication code to ensure integrity. You will also have to store the scrypt parameters separately. Lastly, there is no native verify function included in this module.

If all you want is to derive a key using the scrypt algorithm, you can do so with the KDF function also included in this package.

The so called hash function in this package is a wrapper on top of the original scrypt hash function.

answered May 07 '15 at 11:33

aviv

1,267
7
8

But what would be less secure about using the same bytes as in the docs, only with bits 64 onwards using the direct result of the scrypt algorithm? What does adding HMAC on top get you? This is what I don't understand. – ChrisD May 07 '15 at 11:57
In terms of speed (to make brute forcing harder) it only makes it worse for the attacker, since beyond computing the scrypt hash which is slow by design, he will also need to compute the hmac each time. In terms of collision, it outputs a 32 bytes key which is 128 bit, which is secure enough. Having said that, I do not see any security problem using the scrypt hash directly, and I do not see the need to verify the integrity of the stored hash, which is the benefit that the additional hmac provides. – aviv May 07 '15 at 13:06
@aviv - What do you mean "I do not see the need to verify the integrity of the stored hash, which is the benefit that the additional hmac provides"? – Neil Smithline May 07 '15 at 14:13
1

since the hash is stored in the database and you can consider it as trusted data, I do not see added value in having the capability to verify it hasn't been tampered with. If an attacker managed to get access to the database, tampering with the stored hash is probably not his priority, he can do much better things. And even if the attacker wanted he would still be able to tamper with the hash to make it pass the verify function, all he needs is to do the same process of doing hmac to the parameters and salt using the scrypt hash of the password he chooses. – aviv May 10 '15 at 09:35

Why does node.js scrypt function use HMAC this way?

2 Answers2

Linked

Related