Is SHA1 / PBKDF2-HMAC-SHA1 indissociable from random data?

Question

I want to make sure that the result of PBKDF2-HMAC-SHA1 is indissociable from random data (given random parameters). Basically, PBKDF2-HMAC-SHA1 looks like just a bigger SHA-1 hash, which is normal given that the pseudorandom function used is SHA-1 as the name implies.

./pkcs5 -i 1 -s RANDOM_SALT -p RANDOM_PAYLOAD -l 20
c8ddde91936a728445e238badde1ef7e94de5b36

Tool used: dzmitryk / pkcs5.c

Note that it's printed in HEXADECIMAL format so the output really looks like random binary data. Is there any scientific proof that SHA-1 can hardly be differentiated from random data?

& same question for SHA-256.

Why am I asking that?

Because Plain dm-crypt has several disadvantages compared to LUKS:

Password is not changeable without re-encrypting the whole disk
Only one password can be used
The disk encryption key is derived from the password without any salt
A plain dm-crypt partition may coincidentally end up looking like a unencrypted filesystem, and has a chance of being written to accidentally.

But LUKS has one major flaw: it's not deniable. So by trying to reach this deniable goal, people either use USB keys with unencrypted keyfiles (horrible) or detached LUKS header (not deniable if the opponent finds the USB key...). There is something wrong in those approaches.

So I'm trying to keep the best of LUKS header and make an "Ain't NO LUKS" header that would be indissociable from random data (so DENIABLE), containing only:

The LUKS keyslots encrypted key payloads - result of AES so random
The LUKS salts - pure random
The LUKS masterkey digest - result of on PBKDF2-HMAC-SHA1

Everything else (cipher mode, key length, etc.) would use defaults to be hard-coded in GRUB cryptomount command or easily parametrizable from the command-line.

EDIT: Related questions on StackExchange cryptography:

forest · Accepted Answer · 2018-04-29T10:00:10.783

Is there any scientific proof, meaning information theoretic secure? No, however there is no known way to distinguish the output of a good cryptographically secure hash from random data of the same length. Someone can probably guess something is SHA-1 if it's 160 bits, because most random-looking chunks of 160 bits tend to be SHA-1 digests, or sometimes RIPEMD-160, but it has nothing at all to do with the content of the digest, only the size.

Note that in the future, there may be new mathematical techniques discovered (invented?) which find a bias in SHA-1 or SHA-256 digests. It could range from so academic as to be almost silly ("after 10⁹¹ known plaintexts, then with 10⁹¹ operations and 10⁹¹ memory, you can calculate what the 6th bit will be with a 0.05% higher probability than chance") to absolutely devastating ("a preimage attack in SHA-256 can be achieved with $5000 worth of consumer hardware and a 300 line C program in a couple weeks"). While obviously the latter is far less likely, they both can happen. We just can't know which it will be, and have no way to predict it. However you can probably rest safe knowing that for the vast majority of randomness distinguishability attacks (the ability to tell when pseudorandom data is not actually random) requires a truly vast number of samples, likely far more than you will ever encrypt in your lifetime.

There are a few exceptions, like RC4 which has a bias in the first few bits, ciphers with a 64 bit block size like Blowfish and CAST5 which show distinguishability after several gigabytes, and block ciphers in xts-plain mode which take a few terabytes of data encrypted with the same key to cause problems, but they are not the rule.

If you are still worried, you should know that the Linux kernel, which is relied on for security in billions of devices all around the world, generates randomness by mixing a pool using SHA-1, and exporting the hash values to /dev/random and /dev/urandom, both of which produce exceptionally high-quality random data. If SHA-1 were not extremely well trusted, it would not be used so ubiquitously (note that SHA-1 running into the end of its life for protecting against collisions in certificates has no effect on its ability to look random). In case you're wondering, there's no security reason it's using hash mixing instead of a stream cipher. It used SHA-1 to get around now-obsolete export restriction laws, and it just stuck. Other systems like OSX and FreeBSD use Yarrow, OpenBSD uses ChaCha20, etc, but SHA-1 is just as secure, as far as we are aware. Just a lot slower to mix!

Why don't you just use TrueCrypt though? It supports plausible deniability, which seems to be what you are after, and both cryptsetup and GRUB2's cryptomount support it. Cryptsetup natively converts it into dm-crypt mappings, just like LUKS.

Thanks for your answer. TrueCrypt/VeraCrypt is not the de facto standard for LInux system encryption so it has [some disadvantages in this case](http://superuser.com/questions/1019673/how-would-i-encrypt-my-whole-linux-filesystem-with-veracrypt). I prefer leveraging LUKS/dm-crypt. — KrisWebDev, Apr 02 '16 at 09:02
It is not the de facto standard, but it can be implemented with just a userspace helper which is already ubiquitous (cryptsetup), and dm-crypt. In fact that's the way cryptsetup can create Truecrypt mappings. All it really does is interpret the Truecrypt header, calculate the offset so it doesn't overwrite the header, and convert it into dm-crypt, which is the de facto standard. In this way, it's the same as LUKS (a userspace helper interprets it and "converts" it into dm-crypt). — forest, Apr 03 '16 at 03:03
One way you can test is by decrypting a Truecrypt device using cryptsetup, then running the command "dmsetup table --showkeys". It will output dm-crypt information including the master key for each cascade cipher used by Truecrypt, as well as other metadata. That information is all the kernel ever needs to keep the encryption going, and it's the userspace helper parsing the header which gives it that. — forest, Apr 03 '16 at 03:09

Is SHA1 / PBKDF2-HMAC-SHA1 indissociable from random data?

1 Answers1