Since GPUs have gigabytes of memory, does Argon2id need to use gigabytes of memory as well in order to effectively thwart GPU cracking?

Question

The common advice of benchmarking a password hashing algorithm and choosing the slowest acceptable cost factor doesn't work for algorithms with more than one parameter: adding a lot of iterations at the expense of memory hardness makes the benchmark time go up, but if the attacker can still grab off-the-shelf hardware (like a regular gaming graphics card) then I might as well have used pbkdf2 instead of Argon2id. There must be some minimum amount of memory that makes Argon2id safer than a well-configured algorithm from the 90s, otherwise we could have saved ourselves the effort of developing it.

I would run benchmarks and just see for myself at what point hashing isn't faster anymore on a GPU than on a CPU, but Hashcat lacks Argon2 support altogether. (Though any software I choose isn't necessarily going to be the fastest possible implementation to give me a correct answer anyway.)

Even my old graphics card from 2008 had 1GB of video memory and 2020 cards seem to commonly have 4-8GB. Nevertheless, the default Argon2id setting in PHP is to use 64MiB of memory.

If you set the parallelism of Argon2id to your number of CPU cores (say, 4) and use this default memory limit, is either of the following (roughly) true?

A GPU with 8192MB of memory can still use 8192/64 = 128 of its cores, getting a 32× speed boost compared to your CPU. The slowdown on GPUs due to increased memory requirements is a linear scale. The only way to thwart GPU cracking effectively, is to make Argon2id use more RAM than a high-end GPU has when divided by the number of parallelism you configure in Argon2id (i.e. in this example with 4 cores, Argon2id needs to be set to use 8192/4 = 2048MB).

or

This amount, 64MiB, already makes a common consumer GPU completely ineffective for cracking because it is simply too big to efficiently use from a simple GPU core (because if the core has to access the card's main memory then it's not worth it, or something).

GPUs work to speed certain tasks because they do the same thing to different data in parallel. Iterated key derivation systems (bcrypt, Argon2, PBKDF2, etc.) require the data be worked on in serial. A GPU might be able to speed up cracking several passwords at a time, but they won't speed of cracking a single password. Using a new password for every site, generated from a random source, with at least 12 characters means that they won't brute force your password for a few decades, no matter what off-the-shelf hardware they buy. — Ghedipunk, Apr 05 '21 at 03:45
@Ghedipunk "*[A GPU] won't speed of cracking a single password*" From first-hand experience, that's not true. Have you ever tried cracking a password? Each of those GPU cores can do a section of the search space, so also for a single password, the parallelism speeds it up massively compared to a CPU with only a handful of (albeit faster) cores. — Luc, Apr 05 '21 at 12:02
@Ghedipunk: When an attacker is brute-forcing a *single* password, *many* different passwords are tried. Usage of GPUs *can* speed up brute-forcing, because for many operations GPUs are more efficient than CPUs. — mentallurg, Apr 05 '21 at 16:50

Luc · Answer 1 · 2021-05-16T12:46:59.113

Practical advice

Use Argon2id.
The more memory you use, the less of an advantage an attacker will have compared to your CPU.
In absolute numbers, against 2021 GPUs:
- 128 MiB is quite secure (but don't hesitate to go for 2 GiB or more, if you can!),
- 32 MiB is reasonable, and
- 4 MiB is decidedly on the low side.
Against ASICs, you definitely want to go higher. If your adversary has the budget for Argon2 ASICs, a few gigabytes of regular RAM being used for password hashing is no excessive luxury. Also effective is adding multi-factor authentication and enforcing globally unique passwords (check against public breaches).

For less-powerful adversaries, it is also interesting to note that cracking software seems to be lagging behind and as of 2021, there is still no good Argon2 cracker available. Bcrypt and scrypt are (in turn) much better than PBKDF2-class* algorithms, mostly due to their better design but also because less effort has been put into optimizing for these strong hashes, especially when using scrypt with >16 MiB of memory.

_{* Anything that does plain iterations of hashing functions, so this includes custom designs that a lot of developers use(d). Bcrypt is in a higher class because "Bcrypt happens to heavily rely on accesses to a table which is constantly altered throughout the algorithm execution. This is very fast on a PC, much less so on a GPU, where memory is shared and all cores compete for control of the internal memory bus. Thus, the boost that an attacker can get from using GPU is quite reduced, compared to what the attacker gets with PBKDF2 or similar designs." --Thomas Pornin in Do any security experts recommend bcrypt for password storage?}

Cracking memory-hard hashes

A hashcat release came out today (while I was wrapping up my research) with better scrypt support (another memory-hard algorithm). I've been testing with an RX 5700 GPU which seems to cost close to $1000 in today's inflated prices and has 8 GB of VRAM. Trying to crack scrypt hashes with hashcat earlier today, it errored out at 16 MiB already (N=16×1024, r=8, p=1); with the new update, I can now crack scrypt hashes with a whopping hundred mebibytes of memory. At 64 MiB, hashcat is 17 times faster than Python's hashlib.scrypt which uses OpenSSL's implementation. It's better than nothing, but for comparison, PBKDF2 gets a 1000 times speedup on the same CPU/GPU combination. (And, for the record, Bcrypt also gets a 17-times speed-up from CPU to GPU.)

For Argon2, the fastest cracking software that I can find is a CPU implementation. The only decent-looking GPU implementation that I can find can allocate up to 434 MiB before erroring out, but is slower than PHP's implementation (using libsodium if I'm not mistaken) regardless of how much memory you make it use. And PHP also doesn't limit the memory usage, it will happily compute Argon2 with 13 GiB of RAM for you in as many seconds.

Theoretically

You should be able to allocate as much memory as your GPU has, even if practically no free cracking software actually supports that. Either way, by requiring constant access the main memory rather than being able to use core-local registers, you do limit a GPU's effectiveness: its memory is all shared and the bandwidth limited. The CPU has the same limitation, but that just puts it on a more level playing field. GPU bandwidth still exceeds that of a CPU*, but it's on the same order of magnitude.

ASICs are a next-level kind of attack that I did not worry about when writing the question. But I've looked into it now anyway, and GPUs seem largely ineffective with today's state of the software almost regardless of what settings you use, so we might as well consider them.

The cost of producing an ASIC seems to be mostly determined by the amount of memory you need. The Argon2 paper (section 2.1) describes it as:

We aim to maximize the cost of password cracking on ASICs. There can be different approaches to measure this cost, but we turn to one of the most popular – the time-area product [4, 16].

[...]

The 50-nm DRAM implementation [10] takes 550 mm² per GByte;

The Blake2b implementation in the 65-nm process should take about 0.1 mm² (using Blake-512 implementation in [11]);

It isn't stated in as many words and the paper just speaks of surface area rather than cost, but it sounds like increasing RAM capacity on an ASIC is a lot more expensive than increasing computational power is.

So in terms of picking Argon2id parameters, the correct order is:

First increase memory usage to as much as possible with a time cost parameter of 1**.
Increase the parallelism as much as still possible without lowering the previous parameter, until you hit the number of cores your system has.
Increase the time cost parameter as much as still possible without lowering the previous parameters.

The "as much as possible" means: until it either runs out of system resources (RAM, cores) or until it gets slower than your users have patience for.

It's not weird if the only parameter you tweak is the memory usage. It might get plenty slow from that already.

Parallelism is a bit complicated.
In Argon2, the two parts when setting p=2 can be computed in parallel, so if your memory parameter fits in cache or if you use less than half the available memory bandwidth, p=2 should not take any longer than p=1 on your CPU, yet should be stronger. That seems to work in practice: testing in PHP (which uses libsodium), increasing the 'threads' setting is clearly a lot faster. (Note that the paper calls it 'lanes' to set 'parallelism' which PHP then calls 'threads'...) To make sure you use your hardware to its full extent (making it comparatively harder for GPUs/ASICs to compete), setting this setting higher is recommended before increasing the time cost parameter (since the time cost parameter is nothing more than running the algorithm multiple times over the specified amount of memory).
Scrypt doesn't say much about it in its paper or RFC, but seems to serve the same function. In practice, it seems scrypt implementations don't do multithreading: Hashcat and OpenSSL seem just take p times as long. There is also an scrypt JavaScript implementation which "can multi-thread in browsers", but this only seems to work in Firefox on Linux with p=1 and p=2 (equally fast) or in Firefox on Windows with p=2 and p=4 (equally fast). Other values (e.g. p=1 and p=2 in Firefox on Windows) are linearly slower, and other browsers (such as ~~Chromium~~ Edge) don't seem to support it at all. You might want to look for an implementation that supports this to get a stronger hash, or use Argon2 instead.

_{* "The CUDA implementation can reach about 40-60 GiB/s [...] on an NVIDIA Tesla K20X. For comparison, a fast Intel Xeon processor can only reach about 10 GiB/s." ---Panos Kal., 2017, on "Cryptography algorithms that take longer to solve on a GPU than a CPU"}

_{** Since memory is most important, the time cost parameter should be set to 1 initially. This is not insecure: the Argon2 authors wrote "Again, there is
no ”insecure value” for T" in the Argon2 paper section 6.4, and the Crypto Forum Research Group writes that "Argon2id [with] t=1 and 2GiB memory is [...] suggested as a default setting for all environments. [Setting] t=3 and 64 MiB memory is [...] suggested as a default setting for memory-constrained environments."}

Question's Quirks

The question's phrasing demonstrates misconceptions on the part of the author.

The proposed statement "[X MiB of memory] already makes a common consumer GPU completely ineffective" is simply false because the GPU cores can address all the memory the GPU has, but also kind of true in a practical sense because current implementations seem to break long before getting close to the GPU's actual memory limit. To put a number to this value, 1 GiB seems plenty to trigger this effect, but fixing that is a matter of software and not due to hardware limitations.

The alternative statement "The slowdown on GPUs due to increased memory requirements is a linear scale." is also wrong, since this fails to parameter in the memory bandwidth difference. It's not the case that you can close the speed gap by approaching the amount of memory your cracking hardware has. On the other hand, using too low a memory parameter does not saturate your CPU's memory bandwidth and gives a GPU cracker a comparative advantage, so there is (accidentally) also a core of truth here.

As a user

Keep in mind that you can always defend your own accounts by using strong and unique passwords. Even the dumbest of password hashes is (pre-quantum) secure if your password is something like 15 random characters (picked from {a-z, A-Z, 0-9}). If you don't re-use passwords, then it also doesn't really matter if one was cracked. A password manager can help you generate and manage these passwords.

mentallurg · Answer 2 · 2021-04-05T21:33:48.760

There are different ways to make brute-forcing of Argon2id harder. Two of them are following:

Make passwords longer, and of course make passwords randomly generated, not generated by humans
Increase work factor, i.e. use more iterations

It makes sense to consider memory factor only then, when increasing computing power in 1-2 orders of magnitude can really lead to successful brute-forcing.

Let's consider 2 cases.

1. Passwords of 20 chars from 62-char set, password hashing takes 1s on a single core of 8-cores CPU

How many passwords need to be tried? Supposing that a half of all possible passwords is sufficient, we get: 62^20/2 ~= 10^35.

How many passwords can test such a CPU per year? 8 cores * 60 seconds/minute * 60 minutes/hour * 24 hours/day * 365 days/year ~= 10^8.

Thus 10^35 / 10^8 = 10^27 CPU-years are needed.

Suppose, some GPU is 1 000 times faster than such CPU. Suppose some organization can afford 1 000 000 000 such GPUs (only a few countries in the World can afford it). Then still it will take 10^27 / (1 000 * 1 000 000 000) = 10^15 years to brute-force such password.

What will we gain from using bigger memory factor and reducing the effective GPU power? Is the duration of 10^15 years not huge enough? In this case it makes no sense to spend time analyzing what memory factor how much can slow down GPUs. Even the small memory factors like 10K will be sufficient, because duration of 10^15 years makes brute-forcing impossible.

2. Passwords of 10 chars from 62-char set, password hashing takes 0.000001s on a single core of 8-cores CPU, or 10^6 per second

How many passwords need to be tried? Supposing that a half of all possible passwords is sufficient, we get: 62^10/2 ~= 10^17.

How many passwords can test such a CPU per year? 10^6 * 8 cores * 60 seconds/minute * 60 minutes/hour * 24 hours/day * 365 days/year ~= 10^14.

Thus 10^17 / 10^14 = 10^3 = 1000 CPU-years are needed.

If some GPU is 1 000 time faster than such CPU (which is way too optimistic), then such single GPU can brute-force the password within 1 year. 10 such GPUs can brute-force the password within 36,5 days. Many can hackers can afford even 100 GPUs. Then password will be brute-forced within 0,365 days or within 9 hours.

In such case restricting the number of cores used for brute-forcing can really slow brute-forcing down and for some attackers this can be prohibiting (too expensive), or time needed for brute-forcing can become much longer, so that when the password is brute-forced and information decrypted, this information is not secret any more and is freely available.

TLDR

Big memory factor can be not necessary in case password entropy and work factor are relatively big. A 20-char password and hash generation of 1s makes brute-forcing impossible. It is up to you to calculate and decide what smaller values are acceptable to you.

Big memory factor makes sense only in cases when the risk of brute-forcing is high and slowing the brute-forcing down in 1-3 orders of magnitude reduces the brute-forcing risk to the acceptable level.

To your question

does Argon2id need to use gigabytes of memory as well in order to effectively thwart GPU cracking?

If you use relatively long passwords, e.g. 20-char passwords, and if work factor in Argon2id is big enough, e.g. hashing of a single password takes 1s, then no, you don't need to use Gigabytes of memory per password, any small memory factor will be sufficient.

If you use relatively short passwords, e.g. 8-10 characters length, and if work factor is relatively small, e.g. hashing of single password takes 0.000001s, then yes, using of Gigabytes of memory per password can slow brute-forcing down. For instance, some Nvidia GPUs have ~6000 cores with 12 GB RAM. If you use e.g. 64MB per password, it will effectively reduce the number cores to 192 instead of ~6000, thus the brute-forcing will be ~30 times slower. After you decided how much memory you want to use, check that the number of iterations is not too small, see the comment of brynk.

Why use Argon2id at all?

Then why use Argon2id at all, why not PBKDF2? Because it has for instance a better side-channel resistance. See more details here:

This question is about limiting the attacker; specifically, how to ensure I exceed effective memory limits of typical cracking hardware (i.e. graphics cards). The answer only speaks of the defense side, recommending to set parameters as high as possible for my situation. Choosing the best possible parameters indeed goes without saying, but that doesn't tell me whether setting it to your example of 16MB "effectively thwart[s] GPU cracking". — Luc, Apr 05 '21 at 12:30
I will also add that the *Argon2id* spec informs the number of iterations in relation to the amount of RAM. [There is a tangentially related discussion about hardness parameters, specifically in this comment](https://security.stackexchange.com/questions/245122/dm-crypt-2-0-3-luks-salt-auto-generated-what-affects-decryption-speed/245146#comment504745_245146). — brynk, Apr 05 '21 at 20:25
@brynk: Thank you, it is a good point. I have updated the answer. — mentallurg, Apr 05 '21 at 21:34
A password hashing system should be designed with the goal of protecting even the weak passwords (because users will, inevitably, use them). So saying that with sufficient entropy in passwords, memory hardness is not required sort of misses the point of good password hashing practices. — nobody, May 15 '21 at 13:59
@nobody: Do you mean that memory hardness in Ardong2 makes no sense? Or what do you mean? — mentallurg, May 15 '21 at 14:06
No that's not what I meant, sorry if I wasn't clear. You wrote *"Big memory factor can be not necessary in case password entropy and work factor are relatively big."*. I'm just saying, when designing a password verification system, you should make it as secure as reasonably possible for even passwords with low entropy, because low entropy passwords are what users tend to choose. So including advice about what is required for securing high entropy passwords isn't too helpful (and can be harmful if some fool decides to interpret it the wrong way). — nobody, May 15 '21 at 21:40
Thanks for this very detailed answer. I don't understand one thing: If we can change parameters of the hashing algorithm and those parameters affect the operation of the algorithm, wouldn't the outcome, i.e. the calculated hash, also be dependent on the parameters? Given I calculated a hash with unknown parameters. Let's say I am to crack this hash. Wouldn't I need to know or brute force the parameters, since argon2 with only 1 differing parameter (e.g. m, t or p) equals another hash? Add this on top of argon2 with highly chosen parameters takes long, is this not near impossible to crack? — prohit, Jun 01 '22 at 14:15
@prohit: Of course the hash does depend on parameters. To *"Given I calculated a hash with unknown parameters"*: 1) To be able to check if given plain password corresponds to a hash, you need to compute hash for this password with the same parameters. 2) Based on Kerckhoffs's principle, we should assume, that the attacker knows all the parameters used to compute the particular hash. That's why usually all parameters (salt, work factor, memory factor etc.) are stored together with hashes. They don't need to be secret. — mentallurg, Jun 01 '22 at 15:52