How can I prevent side-channel attacks against authentication?

Question

After reading this excellent answer, I learned about the existence of side-channel attacks.

From the code example provided, it is possible to determine the correct password by timing the code when given various inputs.

for (i = 0; i < n; i++) {
  if (password[i] != input[i]) {
    return EFAIL;
  }
}

What can I do to ensure that my code is not vulnerable to such timing attacks? I have purposely left this open-ended to allow answers to provide examples and best practices for a variety of common software configurations.

Hopefully you're not checking a password against plaintext passwords that you store somewhere. — user, Oct 30 '19 at 14:47
On PHP, use `hash_equals`, it's a timing safe string comparison function. — ThoriumBR, Oct 30 '19 at 15:02
To avoid being *too* broad, can you narrow down the scope a bit? "How can I protect [arbitrary code] that does [arbitrary thing] against timing attacks" is only slightly less broad then, "How do I secure my application?" Is there a particular kind of side channel attack you are interested in (password checks, authentication checks, etc...)? — Conor Mancone, Oct 30 '19 at 15:47
@ConorMancone I have clarified that I am looking at timing attacks only. — dalearn, Oct 30 '19 at 17:26

score 36 · Accepted Answer · edited Dec 16 '21 at 17:21

From the code example provided, it is possible to determine the correct password by timing the code when given various inputs.

First, you shouldn't actually examine the password directly! At the very least, you should hash the password with a password hash like Argon2id first, and compare the password hash of the input with the password hash you stored during user registration (or when the user last changed their password).

Even better, you should use a password-authenticated key agreement protocol like OPAQUE, but these may be beyond your pay grade at the moment until they see more widespread adoption and implementation.

What can I do to ensure that my code is not vulnerable to such timing attacks?

The best way to start is to use a library routine or primitive that someone else has already written and has a reason to maintain. For example, in NaCl/libsodium, you can use crypto_verify_32 to compare two 32-byte strings, like two Argon2id hashes, or two HMAC-SHA256 message authentication codes. Then the effort to answer this question can be focused on a single place that will get a lot of attention and scrutiny and will keep up with advances.

But let's say you don't have crypto_verify_32, or you want to implement it yourself. What can you do?

To start, you need to understand what operations have side channels. It's tempting to say—as other answers did—that the side channel arises only because of an early abort. But that's not the whole story. In general, there are many operations (here written in C for illustration) that may take an amount of time which depends on the values of the inputs—we call these operations variable-time operations, in contrast to constant-time^*:

for (i = 0; i < n; i++) if (x[i] == y[i]) return EFAIL; obviously takes fewer loop iterations so it is practically guaranteed to run in variable time depending on the secret values of x[i] and y[i].
A mere secret-dependent conditional for (i = 0; i < n; i++) if (x[i]) bad++;, if x[i] is secret, may run in variable time too even if the loop doesn't abort early. Why?
- Here's a coarse approximation. The machine instructions that the CPU may execute look something like this:
```
0:      tmp := x[i]
        branch to 1 if tmp is zero
        bad := bad + 1
1:      i := i + 1
        branch to 0 if i < n
```
  The number of instructions executed depends on what the value of x[i] is at each iteration: we skip bad := bad + 1 on some iterations. This is a good model for how early timing attacks on, e.g., RSA worked as in Kocher's seminal paper on timing attacks: the main modular exponentiation loop computes a (say) 2048-bit modular squaring unconditionally, but computes a 2048-bit modular multiplication conditionally depending on the value of the secret exponent. Skipping the multiplication substantially changes the time taken by the whole operation.
- There's another reason, though, and it has to do with branch prediction, a key design element in what makes modern CPUs run so fast on many workloads—even if you write the same amount of code (say, same number of machine instructions, and somehow you guarantee that they take the same number of cycles to compute) in each branch of a conditional, the time it takes to execute may depend on which way the condition went.
- In general, CPUs are bad at keeping which instructions got executed secret, so don't make the choice of instructions depend on secrets.
Table/array lookups might take a different amount of time depending on what memory has been cached in the CPU cache. Consequently, if the location in the array that you're reading from depends on a secret, the time it takes might depend on the secret, which has been exploited to recover AES keys by cache timing.

(This makes AES a rather questionable design in retrospect, with its intentional use of key-dependent table lookups! NIST's published rationale (§3.6.2, Attacks on Implementations: The Role of Operations) curiously claims table lookups are ‘not vulnerable to timing attacks’ despite the numerous such attacks that have been reported since then.)
Variable-distance shifting like x = y << z may take more time on some CPUs if z is larger and less time if it is smaller.

(This makes RC5 and the AES finalist RC6 rather questionable design in retrospect, with their intentional use of key-dependent rotation distances!)
On some CPUs, multiplication may run faster or slower depending on whether the upper half of the inputs is zero or not.
64-bit integer addition on 32-bit CPUs in principle might take more time depending on whether there is a carry. This is because, when x, y, and z, are 64-bit integers, the logic x = y + z might look something more like:
```
int carry = 0;
x[0] = y[0] + z[0];
if (the previous addition overflowed)
  carry = 1;
x[1] = y[1] + z[1] + carry;
```
Consequently, the time it takes may depend on whether there is a carry from the sum of the low 32-bit halves to the sum of the high 32-bit halves. (In practice, this is usually only a concern on exotic CPUs or for other types of side channels like power analysis that are relevant more to smart cards than to laptops and phones.)

This might sound a little overwhelming. What can we do?

There are some operations that generally do run in constant time on most CPUs. They are:

Bitwise operations: x & y, x | y, x ^ y, ~x, and other ones that don't appear in C like AND-with-complement.
Constant-distance shifts and rotations like the shift x << 3 or the rotationx <<< 3 (not standard C but common in cryptography; it means (x << 3) | (x >> (32 - 3)), if x is 32-bit).
Often integer addition and subtraction: x + y, x - y, when x and y are (say) unsigned 32-bit integers on a 32-bit CPU, and often even 64-bit integers on a 32-bit CPU with the help of ADD-with-carry instructions.
Sometimes integer multiplication, but the story about multiplication is complicated, which is unfortunate for cryptography because multiplication mixes bits around quite nicely and has useful algebraic properties.

To be clear: I do not mean that a C compiler guarantees these operations run in constant-time if you use them in a C program; I am merely using C notation for the operations that CPUs generally execute in constant time. (More on this caveat in a moment.)

‘But wait,’ you protest, ‘how can I possibly write a useful program out of these operations? No conditionals? No loops? No arrays?’

First, you don't have to eschew conditionals, loops, or arrays altogether. They just can't depend on secrets. For example, for (i = 0; i < 32; i++) ... x[i] ... is fine. But for (i = 0; i < m[0]; i++) ... is not fine if m[0] is supposed to be secret, and for (i = 0; i < m[0]; i++) ... tab[x[i]] ... is not fine if x[i] is supposed to be secret.

Second, you can still build these things! It's just a little trickier. For example, suppose b is a uint32_t that is either 0 or 1. Then b - 1 is either -1 = 0xffffffff or 0, respectively, so

x = ((b - 1) & z) | (~(b - 1) & y);

causes x = y if b is 1, or x = z if b is 0—much like x = (b ? y : z), but without a branch. Obviously, this requires computing both y and z, so there is some performance impact! Similarly, you can look up an element of a table by looking up all elements of the table and selecting the one you want with bitwise operations like this. Not as fast as x[i], but not as leaky either.

In general, you can convert a program with conditionals into a logic circuit with no conditionals, even if you don't want to for performance reasons. There are various other similar tricks you can do. You might draft a constant-time memory equality routine such as crypto_verify_32 like this, assuming x and y are uint8_t arrays:

uint32_t result = 0;
for (i = 0; i < 32; i++)
  result |= x[i] ^ y[i];
return ((result - 1) >> 8) & 1;

Exercise: Does this return 0 for equal and 1 for inequal, or 0 for inequal and 1 for equal?

Writing programs like this—and adopting cryptosystems such as X25519 that encourage implementations that look like this, instead of cryptosystems such as RSA or AES that encourage implementations involving involve secret-dependent branches or secret-dependent table lookups—is a good start for plugging timing side channels.

But, there's a catch! Remember when I said that the C compiler doesn't guarantee constant time? A smart C compiler like Clang/LLVM might recognize that the clever crypto_verify_32 loop above can be executed more efficiently by making it abort early, and might undo the hard work you did to rewrite it as a logic circuit that runs in constant time. (In other circumstances, it might help you out, for example by converting x = (b ? y : z); into a conditional move instruction, CMOV, without branches, but usually you can't rely on the C compiler's good will.)

There are some tricks you can do to thwart this, like an inline assembly fragment that causes the compiler to drop roughly all assumptions for optimization:

uint32_t result = 0;
for (i = 0; i < 32; i++)
  result |= x[i] ^ y[i];
asm volatile ("" ::: "memory");
return ((result - 1) >> 8) & 1;

This may or may not work with your compiler. To be confident, you really have to examine the compiler's generated machine code—and even then, a compiler might perform just-in-time optimizations that rewrite the machine code according to profiling analysis, especially in higher-level languages like Java. So you might really want to write the logic in assembly (or in a programming language like qhasm that can generate the fine-tuned assembly more reliably than a C compiler), and just call it from C.

Maybe some day C compilers will adopt a secret qualifier, like const or volatile, which forces the compiler to generate only machine instructions that are known—in some model of the CPU!—to run in constant time when operating on the object, and prevents the compiler from taking shortcuts like secret-dependent early aborts from a loop. But that day is not here yet.

There's also a matter of which machine instructions actually run in constant time on a CPU, which is sometimes documented and sometimes reliable. So in addition to doing engineering to build your programs out of logic circuits, you also need to do science to figure out which operations are actually safe to use on the CPU.

This brings us back to the original point: You really want to focus the effort of maintaining this into a library routine, so that each programmer doesn't have to keep track of vagaries of compilers (and CPU designs!) in generated code and timing on their own, and can instead leave it to our friendly neighborhood bear.

Are there other countermeasures than constant-time logic? Sometimes, yes.

You could inject random noise into your logic, in the hope that it will confuse the attacker's measurements. But there's already noise in their measurements, like scheduling in the operating system, so they just have to take more samples—and it turns out noise is not a very effective side channel countermeasure.

Specifically, artificial noise raises the attacker's costs by at most about the square of the ratio of artificial noise to true noise, which is far below what is usually considered an acceptable gap for security in cryptography. So it mostly costs you a lot of time doing nothing.
You could use algebraic properties of the cryptosystem to randomize it, sometimes called ‘blinding’. For example, instead of computing y^d mod n where d is a secret exponent in RSA, you could pick r at random, compute s := r^e mod n where e*d ≡ 1 (mod (n)), multiply y by s to get (y * r^e) mod n, compute (y * r^e)^d mod n = (r * y^d) mod n, and then divide off r.

Many implementations, such as OpenSSL, use this approach because it's an easy way to retrofit an existing implementation of a cryptosystem like RSA that has the necessary algebraic structure. It's not a bad idea like random noise is, but it does have costs: you have to do the extra work for randomization, you have to have modular division or inversion logic—and side channels might still leak information about r and d. For example, even blinded modular exponentiation will leak the Hamming weight of d unless you take additional countermeasures like adding a random multiple of (n) to d first—which may expose additional side channels, etc.
For the specific case of comparing two byte strings for equality (for example, two message authentication codes), one reasonable option is to hash them with a pseudorandom function family like HMAC-SHA256 under a one-time secret key k, and check whether HMAC-SHA256_k(x) == HMAC-SHA256_k(y).

The probability of a false positive is 1/2²⁵⁶, which is a smaller probability than you ever have to worry about. You can safely use variable-time equality for the HMAC because if x is not equal to y, then the amount of time even in the naivest byte string equality routine (assuming it doesn't bail out at the first zero byte or something stupid like that!) will be independent of the values of x and y: there's a 255/256 probability it'll stop after one iteration, a 65535/65536 probability after two iterations, etc.

Of course, this really helps only if you can implement HMAC-SHA256 in constant time! Fortunately SHA-256 is designed to be easily implemented as a constant-time logic circuit, so C implementations tend to be reasonably resistant to side channels—but, say, Python will get you into trouble because of the small integer cache if nothing else.

_{^* The terminology is unfortunately a little confusing. Here constant-time means that the amount of time does not vary depending on inputs, and is not the same as the asymptotic notion of ‘constant-time’ in computer science, often written O(1), which just means the amount of time may vary depending on inputs but is bounded by a constant. I'm sorry. I didn't invent the terminology. I might have chosen ‘fixed-time’ vs. ‘variable-time’ but it's too late now—‘constant-time’ is entrenched in the literature.}

This is one of the highest quality answers I've ever read. Fantastic. — the_endian, Oct 31 '19 at 05:17
Thats... a good read. I've already been telling everyone "don't implement your own encryptions", but now I know even better why. A recent case said "But AES is so slow, and my homebrew is faster". — Gloweye, Oct 31 '19 at 07:59
It would look much nicer with LaTeX, but could you bear to remove the LaTeX (possibly just italicizing the variable names)? Or I could have a go. — Martin Bonner supports Monica, Oct 31 '19 at 09:53
After reading through this, it should have much more upvotes. Please remind me to give it a bounty once it's possible. — , Oct 31 '19 at 10:34

score 13 · Answer 2 · 2020-05-22T14:36:42.310

Side-Channel attacks are notoriously difficult to detect, because there are many side-channels that an attacker could look for. This includes, but is not limited to:

Timing Attacks
Cache Attacks
Power-Monitoring Attacks
Acoustic Cryptanalysis

Wikipedia has an excellent list, from which this is just an excerpt. Since there are so many different side-channels, each one of them needs to be addressed independently.

What about timing attacks?

Your code is vulnerable to timing attacks, but you already knew that. The question is, how can you fix it? The solution would be to make a constant-time comparison. One example would be code like this:

difference = 0;
for (i = 0; i < n; i++) {
  difference |= (password[i] ^ input[i]);
}

return difference == 0 ? E_OK : E_FAIL;

This code assumes password and input are the same length, e.g. because they are the output of a hash function. The code would accumulate the bit difference between each pair of elements, then returns a result based if the differences are zero. Also beware that your friendly optimizing C compiler is at liberty to spot what this is doing and generate the assembly it would have generated for your original (broken) code. You need to check the actual generate assembler (or use a library function designed for this).

Of course, this would only protect against one kind of side-channel attack, and not others.

What about the other side-channels?

That depends entirely on the side-channel you are focusing on. Some, such as power consumption, require physical access (or other ways to measure consumption), so they may not be a problem if the attacker is far away.

In general, in order to defend against side-channel attacks you need to:

Be aware that the side-channel exists
Check if this side-channel could be a potential problem in your threat model
Check what information is leaked via this side channel
Check how to prevent leaking this information

Philipp · Answer 3 · 2019-10-30T15:17:05.810

I assume that the code from the question is just an intentionally trivialized example for illustration, because in a real-world system you would never store passwords in plaintext. But if you would want to replace this fictional code with an implementation which isn't vulnerable to timing attacks, then you would make sure that the algorithm doesn't terminate on the first wrong character but always does the same number of comparisions:

bool isCorrect = true;
for (i = 0; i < PASSWORD_MAX_LENGTH; i++) {
    if (password[i] != input[i]) {
       isCorrect = false;
    }
}
return isCorrect;

However, this is not completely proof against timing attacks either, because depending on how the CPU processes this code, it might still take longer or shorter on fails. One possible source of timing difference could be branch prediction.

Grossly oversimplified: When the CPU notices that it processes an if-condition in a for-loop and that if-condition turns out false most of the time, the CPU will optimize itself on the assumption that it always turns out false. This allows it to process that for-loop a lot faster. But if that if-statement does turn out true all of a sudden, then it causes quite a chaos inside the CPU pipeline which takes a couple clock cycle to clean up. So the timing differences caused by branch prediction failures can be another possible timing side-channel. This is hard to avoid, because it's a feature of the CPU which is completely opaque to the developer and can even depend on the exact model of CPU. For more information, do some research on the Spectre vulnerability.

But there is also a different approach to avoiding timing attacks which is crude and simple but effective: Add a random delay after each password comparison. If the length of the delay comes from a cryptograpically secure pseudorandom number generator, then it ruins the accuracy of the time measurements the attacker relies on.

This is more of an implementation detail so not really worth mentioning, but I already started. Adding a random delay only works so long as the "standard deviation" of the delay length is longer than the possible differences in timing. I.e. if branch predictions can add +/- 0.25microseconds but your random delay only adds +/- 0.25 nanoseconds, then it obviously won't help. I know you realize this so I'm not implying you were suggesting an incorrect course of action, but just felt like adding that in to the discussion. And yes, those numbers are non-sensical — Conor Mancone, Oct 30 '19 at 16:02
Please don't recommend secret-dependent branches! _Even on CPUs without branch prediction or speculative execution_, secret-dependent branches are bad news. And please don't recommend randomized delays, which are consistently tempting to people outside cryptography engineering circles but are [not actually very effective](https://crypto.stackexchange.com/a/59901). — Squeamish Ossifrage, Oct 31 '19 at 05:03
(Spectre is a somewhat different issue about branch prediction and speculative execution: even if your cryptography code has no timing side channels in its logic, _other logic_ in the same address space may leak secrets that you thought only the cryptography logic used. So while it's broadly relevant, and it is a kind of timing attack, it's qualitatively different from the problem in the OP's quoted code that could be solved by a constant-time string comparison.) — Squeamish Ossifrage, Oct 31 '19 at 05:05

score 1 · Answer 4 · answered Oct 30 '19 at 15:10

I will try to answer the above problem statement by considering the side channel attack here as time based one i.e.

timing attack watches data movement into and out of the CPU or memory on the hardware running the cryptosystem or algorithm. Simply by observing variations in how long it takes to perform cryptographic operations, it might be possible to determine the entire secret key. Such attacks involve statistical analysis of timing measurements and have been demonstrated across networks

Instead of checking the input as a stream byte by byte and responding the controller/screen/UI on which user can check if the output is correct or not, it should use the data as a block and then perform the equal arithmetic operation on the input data.

Excuse my bad art work.

This attack make use of statistical analysis of the output which can be eliminated. One way of performing such operation is using hashes in which it doesn't matter how long the length of password is, it will always generate a fixed length output.

score 0 · Answer 5 · answered Oct 31 '19 at 11:56

0

Disclaimer: I'm a newbie in this area.

Why not set an expected duration to your checking code and force it to continue executing for at least that long?

DateTime endTime = DateTime.Now + TimeSpan.FromMilliseconds(10);

while (DateTime.Now < EndTime || passwordCheck.IsIncomplete) { 
    // password checking code here
}

answered Oct 31 '19 at 11:56

William Jockusch

149
4

You would have to calibrate it on every CPU you want to use to make sure your deadlines can be made, it wastes a lot of time doing nothing when it could be serving users, and it is very hard to make this _actually_ take a consistent amount of real time. – Squeamish Ossifrage Oct 31 '19 at 13:53

How can I prevent side-channel attacks against authentication?

5 Answers5

What about timing attacks?

What about the other side-channels?

Linked