36

Say we have a Java web application which uses a shared secret to verify the identity of the client. The secret is stored on the server, and the client transmits the secret over SSL where it is checked:

String SECRET_ON_SERVER = "SomeLongRandomValue";

if (secretFromClient.equals(SECRET_ON_SERVER)) {
    // request verified - client knows the secret
} else {
    // request not verified - generate failed response
}

String.equals(String) returns as soon as a single character doesn't match. This means an attacker, if they can accurately track the time it takes to respond, should theoretically know how many characters of their attempt - secretFromClient - match the server secret, leading to a plausible brute force attack.

But the difference in timings seem to be tiny. Quick investigation suggests the differences are easily in the sub millisecond range.

  • Can I safely ignore these kinds of timing attacks due to its insignificance compared to network noise?
  • Are there examples of successful < 1ms timing attacks over the internet?
D.W.
  • 98,420
  • 30
  • 267
  • 572
George Powell
  • 1,508
  • 12
  • 14
  • 1
    Before doing this comparison, get the current time, add a constant to it, and save the result in a variable. After the comparison, subtract the new current time from that saved value, and sleep for the resulting interval. All invocations of this routine will thus always take the same amount of time. – Monty Harder Jan 18 '16 at 19:02
  • 11
    To head off a line of comments: what influences whether such attacks will be possible is not the round-trip time, but the amount of *variation* in round-trip times. If the round-trip time is always 800ms plus or minus 10 nanoseconds, a timing attack is trivial. If the round-trip time is 800ms +/- 100ms, a timing attack is harder. Do note, though, that there are techniques in the literature for dealing with even a non-trivial amount of variation, by averaging out the noise or repeating the query multiple times and taking the minimum; these can be more effective than one might expect. – D.W. Jan 18 '16 at 20:14
  • 1
    Yes, you can safely ignore *this* timing attack due to the complexity and interminism of the many layers involved. Physical network, network protocol implementations with several buffers, application protocol handlers &c. The data is shuffeled around so much between the sender and your Java application that this timing is infeasible to exploit. – JimmyB Jan 19 '16 at 16:58

8 Answers8

33

There are published successes with remote timing attacks. From the document -- "... we can reliably distinguish remote timing differences as low as 20µs." So yes, you should be worried about the underlying implementation of .equals() (spoiler: Not secure). Implement .equals() using a sum of XOR of characters to compare in a timing-independent way.

Here's a python implementation as an example of a timing independent compare of bytes.

def equals(bytes1, bytes2):
    if len(bytes1) != len(bytes2):
        return False
    else:
        differences = 0
        for a, b in zip(bytes1, bytes2):
            differences |= a ^ b
        return differences == 0
Neil G
  • 105
  • 4
sethmlarson
  • 1,479
  • 10
  • 17
  • 4
    I don't know much about security, but couldn't you find out the length of the password with this? If the response takes a bit longer than a shorter/longer password, then you know that you have the correct length which can significantly reduce the different test cases to brute force. – SirPython Jan 18 '16 at 20:37
  • 7
    @SirPython Your password should be long enough (and complex enough) that it can't be bruteforced even if you know the length. – user253751 Jan 18 '16 at 20:39
  • @SirPython Security rarely obscures the length of the payload because messages must be sent over networks which are treated as untrusted. Because the information must be sent over some network, modern cryptography is rarely about obscuring the length of the message but rather the true content of the message. – sethmlarson Jan 18 '16 at 20:40
  • But [this answer](http://security.stackexchange.com/a/92238/52798) says that knowing the password length can help you crack a password more easily. – SirPython Jan 18 '16 at 20:44
  • 4
    @SirPython Once the possible space is large enough for a password, even knowing the exact length of the password will not give enough information to an attacker to brute force it. Knowing a password is 32 bytes long instead of any value less than 32 means that there are still 2 ^ 128 possible passwords, which is still more than anyone could ever brute-force over without additional information about the password. – sethmlarson Jan 18 '16 at 20:48
  • 1
    @SirPython: Revealing the length eliminates a maximum of `1/(chars allowed in password)` options from the total number of options, where `chars allowed` is 10 for numeric passwords, 62 for case-sensitive alphanumerics, and more if you allow symbols. – Michael Jan 18 '16 at 20:50
  • 9
    @SirPython yes, but ordinarily you would *not* be comparing the passwords but their hashes (since why would you store a plaintext copy of the password anywhere ever?), and the hashes would have the same predetermined length no matter if the password has 6 or 60 characters. – Peteris Jan 18 '16 at 20:55
  • @Peteris This is true for passwords, but I was using the phrase "password" as the same as any text you want to be secret to everyone except you and a peer. Assuming the peer uses this text in a plaintext form, hashing it isn't feasible. But yes, passwords should always be hashed before comparing to a stored hash. – sethmlarson Jan 18 '16 at 21:01
  • 1
    That is a great ref. Fascinating, scary reading, forced me to change my own answer mid-write from "haha of course not" to "wow, yes, totally a feasible risk". And now I have to go check my own code isn't vulnerable to that kind of thing... – Dewi Morgan Jan 19 '16 at 05:09
  • 5
    Being able to distinguish 20µs of time difference isn't going to help an attacker trying to time comparison of a secret. I timed string comparisons in python on a modern computer, and it is able to compare 100000 characters within that time. Considering that the secret to be compared is likely to be less than 100 characters long, it means the attack needs to be improved by 3 orders of magnitude to even tell the difference between a mismatch on the first character and a mismatch on the last character. – kasperd Jan 19 '16 at 08:52
  • 1
    @kasperd Exactly that: if your server takes 20µs to compare two characters, congratulations - you managed to install Java on down-clocked Arduino! But "timing attack" sounds kind of cool and scary, so people will upvote, completely ignoring orders of magnitude. – Dmitry Grigoryev Jan 19 '16 at 14:31
  • @dmitrygrigoryev These aren't a one-time-run type of attack, it takes many many probes before any statistical significance arises. Getting a ton of data will always show trends and those trends that emerge as the data set gets larger is the problem that we're worried about, not whether or not a single run of the comparison is the same as another run. – sethmlarson Jan 19 '16 at 20:04
  • @Oasiscircle I already know that. What's your point? Mine is that you can't improve precision infinitely by increasing the set size, and 20µs is as good as it gets. – Dmitry Grigoryev Jan 19 '16 at 20:24
  • 1
    +1 for `XOR` (very helpful against timing attacks) and double length checking. – Mark Buffalo Jan 19 '16 at 20:24
  • @DmitryGrigoryev That's probably a good lower bound for now, but as the gap between networking speed and processor speed becomes narrower due to quantum restrictions on hardware sizes I would argue that the number 20 microseconds will only shrink with time. – sethmlarson Jan 19 '16 at 23:30
27

In theory, this is a possible exploit, and if you are in super-paranoia mode, you should assume the answer being "Yes". In every other case, the answer will be: "No.".

Although there are published papers (one is linked in the answer by @Oasiscircle) which claim that they are able to run successful timing attacks, one has to carefully read the preconditions, too. These published "practical" attacks work on some algorithms on a LAN with one, at most two, switches in between. Which implies an almost perfectly reliable, constant round trip time. For that scenario, it is indeed practical to attack certain algorithms via timing, but this is meaningless in the context of the question.
In fact, I consider these remote attacks as "cheating". The fact that an attack is remote is irrelevant if you carefully design the experiment so the delay is nevertheless almost exactly predictable.

When attacking any server on the internet, this precondition does not hold (not even remotely, pun intended), even on a server that is geographically and topologically near.

Also, attacking a string comparison via timing is not at all the same as attacking a RSA calculation. It is much more difficult because the entire operation as well as the measurable difference is a lot smaller.

A string comparison of a password (assuming your passwords are "reasonably" sized) takes a few hundred cycles or less, of which the possible initial cache/TLB miss is by far the biggest, dominating factor, followed by the terminal mispredicted branch (which happens for both a match and a non-match). The difference between a match and a non-match is maybe one or two dozen nanoseconds.

A context switch takes several hundreds of nanoseconds, as does a cache miss. Schedulers typically operate at a micro- or millisecond resolution and do some very non-trivial work (in the hundreds/thousands of nanoseconds) in between at times that are hard to predict to say the least.

Reliably measuring differences on the nanosecond scale at all is not entirely trivial, either. Ordinary programmable timers do not nearly have the required resolution. HPET on commodity hardware is guaranteed to deliver 100ns resolution (per specification) and in practice goes down to 1ns on many implementations. However, it works by generating an interrupt. This means you can schedule a timer to some point in time precise to the nanosecond, but you cannot really use it to measure single nanoseconds. Also, the interrupt adds an overhead and uncertainty of some dozen nanoseconds (... to some dozen nanoseconds that you want to measure!). Cycle counters need to be serialized to be accurate. Which, too, renders them rather useless for precisely measuring an external event at nanosecond resolution since their accuracy depends on what the pipeline looked like.
There are more things to consider which add unpredictable noise, such as legitimate users (yes, those exist, too!) and interrupt coalescing.

Trying to divine something-nano from samples that include several something-different-nano as well as something-micro and several something-milli is a Herculean task. That's noise from several independent sources on every scale.

Finally, consider the mention of "Java", which means that e.g. a garbage collector may be executing at an unpredictable time (in any case, unpredictable for a remote attacker), causing unpredictable jitter on an unknown (micro, milli?) scale.

In theory, you could of course collect a large number of samples, even at lower resolution, say microsecond scale, and statistically eliminate the various sources of noise. You would never be able to tell for absolutely certain whether a password is correct, but you will eventually be able to tell with a sufficiently high probability (say 85% or 90%, or even 99%), and you can then manually verify those few candidates. That's good enough!

This is possible, at least in theory, but it would take a huge number of samples even for divining a single password. And saying "huge" is really an understatement of galactic proportions. The number of samples needed practically implies that you must parallelize the attack, or it will take forever.

Now, parallelizing such a timing attack to any serious extent is not easily possible because you are subject to the observer effect (in the same sense as in quantum mechanics).
Doing a couple of probes (maybe 5-8) in parallel should work, presuming that the server has enough idle cores, but as you scale up, eventually one probe will inevitably affect another probe's outcome in an unpredictable and disproportionate way. There is nothing you can do to prevent that from happening, so parallelizing doesn't really work well (I am not even taking into account the fact that interrupts usually go over a single core and that there is only a single physical copper wire which data must go through, so even if the server still has idle cores remaining, it may quite possibly be the case that one probe affects another).

On the other hand, running a not-massively-parallel attack is bound to fail because you will die of old age before you find a single password.

Vilican
  • 2,703
  • 8
  • 21
  • 35
Damon
  • 5,001
  • 1
  • 19
  • 26
  • 2
    While I agree that it's probably not the attack path of choice for an attacker, the simplicity of avoiding any possibility of attack is worth the mitigated risk. – sethmlarson Jan 19 '16 at 20:05
  • Wow, this answer is dangerously, catastrophically wrong. Remote string comparison timing attacks are practical, and have been performed a number of times in practice on real-world systems. See for example the ekoparty talk by Paul McMillan, "Practical String Comparison Timing Attacks" https://vimeo.com/112575034. HPET doesn't matter (attackers can buy a network card with nanosecond timestamp support); GC doesn't matter (just take more measurements---and tries where the GC ran might be easy to spot because the timing will be too far off); and attacks don't have to happen in parallel. – dlitz Aug 29 '18 at 20:09
  • 1
    @dlitz: McMillan demonstrates an attack on a slow, ultra-low-power microcontroller connected to the NIC via a 0.5m long cross-over cable. That's kinda ridiculous to be taken seriously in the context "internet", or "server". Seriously. There's no "network", no jitter, no scheduling, no RAM, no caches, none of that kind. Plus, string comparisons are really, really, really slow compared to a server. Sure, you can construct a contrieved case where such an attack is possible, on your desk. But who cares, practically? – Damon Aug 30 '18 at 17:17
14

Store a good cryptographic hash of the secret on the server (i.e. treat it like a password). Your comparison then would be to take the hash of the string the client sends you, and compare the hashes.

If the secret has high enough entropy, this should eliminate timing attacks and prevent leaking of the real secret string, since it should be practically impossible to recover the secret from the hash.

On the other hand if the amount of entropy in the secret is not enough to prevent dictionary attacks, this alone isn't enough. An early-exit comparison can still allow the attacker to learn the first few bytes of the hash; then a subsequent dictionary attack might be able to recover the secret from its hash. (See also Timing attacks on password hashes for more discussion of the possibility of such timing attacks.) This can be prevented by comparing the two hashes using a constant-time comparison method.

So, the most robust solution would be to store a hash of the secret, hash the string the client sends you, and compare the two hashes using a secure constant-time comparison method. Using a salted hash wouldn't hurt, either.

Ben
  • 3,846
  • 1
  • 9
  • 22
  • 5
    This is not necessarily true, it complicates the process but doesn't completely alleviate the problem. This can still leak parts of the hash using the early-exit byte comparison. Example: Attacker computes tons of hashes and stores them looking for a hash that starts with 0x00. Upon finding a plaintext with that value, send that plaintext, listen for timing on response. Do this for 255 more values until one value takes slightly longer. That's the first byte value of the hash created from the plaintext. Obviously leaks less information but leaking any information should be avoided. – sethmlarson Jan 18 '16 at 18:56
  • From [this answer](http://security.stackexchange.com/a/9193/93625), I think you'd be safe with just the hash (but probably no harm in being paranoid and using the constant-time compare as you suggested). – Ben Jan 18 '16 at 19:06
5

@Ben's answer to compare hashes rather than keys directly seems the best-practice approach for the task, and en passant also becomes a partial solution to the problem.

However, it remains vulnerable to some level of rainbow tabled hash tree: try keys that result in a hash beginning with each letter, then those beginning with the found letter and cycling the second, and so forth. Salt and pepper would go some way to address this, perhaps, but if it's a timing issue, then a better solution to the problem would be to completely prevent the timing attack, by sleeping after the compare until the next multiple of 100ms (or whatever time is distinctly larger than the strcmp can be).

To answer the question asked, though, we need to calculate how big the risk is.

On a 1GHz machine, a clock cycle is a nanosecond. A single character of a string compare should be in that order, as should any CPU cache hit (~1ns for L1 to ~30ns for L3). As I understand it, most string comparison functions don't get letters from memory one by one, but rather get them in the most efficient tradeoff lump size. Even if they did, a cache miss requiring DRAM access will still be, in the worst case, only about 100ns/character.

@Oasiscircle links to an article, citing their WAN timing of 20us.

However, it is trivial to establish where a machine is hosted. Someone renting a machine in the same NOC or server farm, will know your exact hardware configuration and will have perhaps only one or two switches between you. If they rent two servers, they can even set up a test system to test measuring times through that NOC to get their results as close as possible.

So of perhaps more interest from that paper is their LAN timing of 100ns with 2,000 measurements with <5% error rate (or 200ns with 1,000 measurements: more measurements would give even better resolution). They did not describe their network hardware, but it may well be that if there are gigabit cards on both attack and target machines, they might get even better resolution.

At that point, an attacker would be able to distinguish length to somewhere between the nearest 1 to 100 characters (depending on how well your cache hits go, how many tests they run, the speed of their network, and a plethora of hardware fine points).

Since it should be assumed that there are attacks that would guarantee cache misses, that they can run arbitrary numbers of tests, that they'll select the hardware best suited to attack yours, and that the resolution of their attack will only get better over the lifetime of your application and server... that effectively gives them 1 character of resolution, and hence your entire key.

So if your system is likely to have technically advanced attackers willing to spend money on targeting your machine specifically, then this is a clear risk, since they have a near certainty of success with this approach.

Equally, if you are renting space in a NOC, even drive-by attacks from other NOC occupants are a risk, at least if this comparison is being done using a protocol that a drive-by attack might discover.

Short answer: yes it's a risk; preventing the time for the operation from being affected by the strcmp is trivial; best to do it.

Dewi Morgan
  • 1,340
  • 7
  • 14
  • 1
    Just wanted to clarify that my citation is detectable at 20 **microseconds** not 20 milliseconds. µs = micro – sethmlarson Jan 18 '16 at 20:44
3

It's not about the practicality of the attack, it's about the habit of doing things securely. It's too common for bugs to creep in because someone thought about weither to escape that string or not, because "it's only numbers, so I can just ignore validating it". Then someone figures out how to exploit that.

Using a cryptographically secure string comparison is so easy and cheap that there's no reason to think about when you really need it and when you don't. Whenever you are dealing with passwords, auth tokens or similar, use a constant time string comparison algorithm.

if a.length != b.length return false
x = 0
for i = 0; i < a.length; i++ {
    x |= a[i]^b[i]
}
return x == 0
Filip Haglund
  • 1,593
  • 1
  • 11
  • 20
  • just out of curiosity: are complilers smart enough / ever going to be smart enough to early-exit the loop when x==256? I know security is often at war with optimizers. – user371366 Apr 13 '17 at 15:18
  • @dn3s early-exiting the loop requires checking that condition in each iteration, which means it's much harder to optimize/vectorize the loop. It's probably going to be noticably faster to run through the entire loop than early exiting, because of branch predicion and vectorization. Also, I assume you ment `x=255` :) – Filip Haglund Apr 13 '17 at 15:23
2

Are there examples of successful < 1ms timing attacks over the internet?

What you might want to think about is if someone was attempting to discover the secret internally. Timing could be judged on a more accurate level from an internal perspective. If you are worried about such a situation, why not change the function to evaluate all submitted characters instead of quitting at the first incorrect one? That way timing will be uniform.

Jason Higgins
  • 647
  • 4
  • 8
2

You're asking the wrong question. Worry is an emotion, and emotions should alert our minds to threats, but once alerted, we need to to think more deeply about what the threat is. It's often useful to measure one risk in comparison to other risks. The timing attacks on the equals method are real, and potentially exploitable, but you may have far larger threats to deal with. So the question becomes what your response to this should be. It largely depends on your context.

If you read through the litany of responses here you'll see an enormous amount of calculation, assumptions, and mitigating factors. The end result is that this isn't a terribly easy exploit to pull off.

Ultimately security is about balancing risk and cost. We never have infinite resources to deal with all the problems, so we're always forced to pick and choose the ones we have to deal with.

Advise developers that this is a real potential exploit, and String comparison should be done using secure algorithms instead of directly. The costs here are really minimal for any new code, and it eliminates the potential for risks.

For any existing code, consider fixing the problem if there's no mitigating circumstances. For instance, these attacks require thousands upon thousands of attempts. If your exploit is password based and you limit the attempts to just 3 attempts, it's going to be near impossible to reveal anything (let's say the hash) with this exploit. On the other hand, if you have a security mechanism to a high value target that's not rate limited in any way, you should consider fixing this code.

Steve Sether
  • 21,480
  • 8
  • 50
  • 76
1

It cannot be determined out of context.

One string comparison like this would be difficult to attack, as described in the accepted answer. The timelines are simply too short. However, context is essential. If a clever hacker can mold the rest of the code around this block to amplify the timings, they may be able to get somewhere.

For example, if there was a clever way for a hacker to cause this function to get called 10,000,000 times back to back before sending anything back to the client, you could get in trouble. Suddenly your sub-millisecond range issue has become a multiple-second issue, and clearly visible against network noise.

This is a case where I would recommend switching to a constant time comparison if your threat model includes timing attacks. You could sit down and analyze your entire codebase to prove that no attacker can ever pull such clever repetition tricks, and then include a quality assurance step to make sure no future change disrupts this analysis. Or you could just switch to a constant time comparison. Potentially put a comment there saying, "The constant time comparison is probably overkill, but it was easier to make this small section of code safe against timing attacks than it was to analyze the entire program to ensure this code is never misused in a way that could lead to a timing attack."

There will be some minor code readability issues, and you may have to go through a QA step to prove the comparison actually works as well as String.equals. There likely wont be any performance costs. If most of your users are legitimate and have the secret, they'll have to compare the whole string anyways.

Cort Ammon
  • 9,206
  • 3
  • 25
  • 26