First of all, the understanding I have of the p
parameter in scrypt
is that it multiplies the amount of work to do, but in such a way that the additional workloads are independent from each other, and can be run in parallel. With the interpretation of p
cleared out of the way, why is the recommended value still 1
? More generally, why is it a good thing that key stretching algorithms are not parallelizable?
From the point of view of an attacker trying to crack a password, it doesn't matter whether an algorithm is parallelizable. After all, even if the entire algorithm is sequential, the attacker can just crack several different passwords in parallel.
I understand that scrypt
being memory-hard makes it difficult to utilize GPUs for cracking. GPUs have a much greater combined computational power accross its many weak cores than CPUs, but the memory bus is about the same speed, so it levels the ground for authentic users on a CPU and attackers on a GPU.
However, subdividing an scrypt
workload that accesses 256MB of RAM into 4 different parallel scrypt
workloads, accessing 64MB each, would still consume the same amount of memory bandwidth for an attacker, therefore running at the same throughput, while running 4 times faster on a quad+ core CPU for an authentic user.
Is there any fundamental flaw in my logic? Why is the recommended value for p
still p = 1
? Is there any downside I can't see to increasing p
?