Suppose I have an AES-256 encrypted file, and I want to derive the key using PBKDF2 and a given salt with a large number of rounds (say 1 million), but I'm limited by user tolerance for UI lag when entering their password. Is it possible to compute a key that is similarly resistant to cracking by computing 100 keys in parallel from the same password and 100 different salts, for 10000 rounds, and then hashing them together with a hash function such as SHA-256?
The standard way is basically this:
key = pbkdf2(pseudo_random_function, password, salt, 1000000, 256);
A parallelizable way could be like this:
for(i = 0; i < 100; i++) // parallelize this on the GPU/FPGA etc
{
partial_keys[i] = pbkdf2(pseudo_random_function, password, salts[i], 10000, 256);
}
key = sha256(concatenate(partial_keys[0] ... partial_keys[99]));
Assuming the derivation method was known to a cracker, would this yield a key that was similarly hard to crack as the standard way, and would it be as secure? If not, roughly how secure would it be (if it is secure at all) in terms of equivalent number of rounds using the normal approach?
Perhaps it's not necessary to store 100 salts, you could just store 1 salt and then generate 100 different salts from it using key-stretching techniques/block ciphers etc. If this was used in an application, you could assume or enforce a minimum GPU capability before allowing the user to install it.
The point of this would be to reduce the latency (time taken to derive a single key for the user), without increasing the throughput (number of keys that can be derived in a given time period with a given amount of computing power by a cracker).