22

Let's say that we're building a generic timing-safe comparison function for general purpose use. Making it so that it is safe when both strings are equal length is pretty well known. However, what I'm not sure about is how we can make it safe if the strings are different lengths (arbitrary).

One string will be deemed as "known", and the other as "unknown". We'll assume that an attacker only has control of the "unknown" value. Ideally, this function should leak no information about the length of the known string.

A trivial implementation, such as:

// returns 1 is same, 0 otherwise
int timing_safe_compare(char *known, size_t known_len, char *unknown, size_t unknown_len) {
    // Safe since all strings **will** be null terminated
    size_t mod_len = known_len + 1;
    size_t i;
    int result = 0;

    result = known_len - unknown_len;
    for (i = 0; i < unknown_len; i++) {
        result |= known[i % mod_len] ^ unknown[i];
    }

    return result == 0 ? 1 : 0;
}

The problem here though is that there may be cache information leak.

For example, a word size in x64 is 64 bits. So we can fit 8 characters in a single register. If the known value is a string that's 7 characters or less (since we add 1 to the known_len), the comparison never requires another load operation for the known string, even though the unknown string will.

In other words, if the size of the unknown string differs from the known string by one or more word boundaries, the total amount of "work" being done may change.

My first instinct would be to only compare strings of equal sizes, but then length information would be leaked (as execution would follow several different branches).

So, this leaves two questions:

  1. Is this small difference enough to be concerned about?

  2. Can this sort of difference be prevented without leaking information about the known size at all?

Gilles 'SO- stop being evil'
  • 50,912
  • 13
  • 120
  • 179
ircmaxell
  • 1,416
  • 12
  • 16
  • For the record, this is related to an [RFC for PHP to add a safe comparison function to core](https://wiki.php.net/rfc/timing_attack). – ircmaxell Feb 03 '14 at 21:22
  • @mikeazo: I figured it was crypto simply due to it normally applying to practical implementations of crypto. If you could move it, it would be much appreciated! Thanks! – ircmaxell Feb 03 '14 at 22:35
  • 1
    If you use fixed size buffers, but only want to compare part of them, that should be possible in constant time. If you allocate variable length buffers with secret length, you already have a leak. Since your API works on strings and not on slices of strings, the former isn't possible, and you could limit it to equal length strings. Or simply say that it leaks the length of both strings. – CodesInChaos Feb 04 '14 at 15:56
  • 1
    @ircmaxell, Why not simply run a busy loop at the end to make sure the function will take `x` number of seconds for all calls? – Pacerier Feb 16 '15 at 03:40

5 Answers5

15

Being able to process strings of arbitrary length without leaking information on their length seems to be very hard (i.e. I don't see how to do it) because of caches. A very long string, by definition, will take a lot of room, and thus reading the string will incur interaction with the caches. Accessing the string from RAM will trigger cache misses, and also evict other data elements from the caches, impacting the future behaviour of the application code. A cache miss costs dozens or even hundreds of clock cycles: it is at least ten times more visible, from the outside, than a branch misprediction. If you worry about branches, then you should worry a lot more about caches.

However, we can cheat with padding. Assume that you could arrange for the two strings you want to compare to be written at the start of two big, equal-sized buffers full of zeros; also, we suppose that a byte of value 0 cannot appear in a normal string (e.g. these are C strings). Then all you need is to make a leak-free comparison of the two buffers, who have the same length. The buffer length will leak, but it is a fixed, constant and publicly known parameter, so that's not a problem.

This does not solves the problem; it moves it. Now, you have to make sure that whatever produced the strings could write them in the buffers without leaking size information. Generally speaking, you no longer have strings. You have binary values of a given fixed length that you copy with a big memcpy(); these values just happen to have a string interpretation in which the bytes are considered to be characters, up to the first byte of value zero.


From a higher point of view, having a "safe string comparison function" is like bringing a bucket aboard the Titanic. If your code is handling secret data, then everything you do with the data is potentially subject to timing attacks. In general, your application can be of two sorts:

  • If the only secret part is a single cryptographic element and everything else is public, then using a few leak-free primitives makes sense, and will improve overall security. A classic example is a Certification Authority, where the only secret part is the CA private key; as long as the signature algorithm does not leak secrets, the whole system is robust against timing attacks. Similarly, a Web site which does password-based authentication but otherwise contains only public data will be fine.

  • If the secrecy is spread throughout the system, such as a Web site which does password-based authentication to give access to some confidential data, then concentrating on the string comparison misses the point. The whole server code must be made leak-free, and that is a considerably more difficult endeavour (and we don't really know how to do it).

In any case, trying to protect any given piece of code against side-channel attacks becomes harder when the language is more "high-level". A language such as PHP, with its automatic memory management (the garbage collector) and string management (string are values just like integers) will not help at all. That's the reason why low-level primitives implemented in C (such as a leak-free string comparison function) must be provided, but the issue is much larger and encompasses a lot of PHP code as well.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 1
    So, would something as an analog to the above (even with its potential length leak) be sufficient and beneficial in your opinion? Seems to be the general trend here. – ircmaxell Feb 05 '14 at 20:36
4

If you assume an adversary that can observe memory access patterns through cache leaks then it's silly to try and protect against the adversary learning the length of the secret. He'll always know. The only way to protect against this is to guarantee that you can access past the end of the string without segfaulting - which you almost surely can't without over-allocating every string in the programming language.

orlp
  • 391
  • 2
  • 15
  • He doesn't have to observe memory access patterns, at least not directly. It will take a different amount of time whether a string is always compared against the same 8 memory locations or 10k different ones. At least that would be my assumption. – NikiC Feb 04 '14 at 08:08
4

Have you researched the needs of the PHP programmers who want this function?

In the practical applications I can think of - verifying passwords, session tokens, etc. the known string would be relatively small, say < 64 bytes; within one Intel cache line. So your trivial implementation would not actually cause different cache access patterns.

If you really need to compare long strings without leaking length, you should consider comparing hashes instead.

paj28
  • 32,736
  • 8
  • 92
  • 130
2

Simply ...

... don't compare strings, compare their hashes!

Yes, I mean this for timing safety (the side effect of password safety, left aside).

What This Does

When comparing hashes, you don't need to worry about following (which might not be obvious at first)

  • The hashing process taking more time, for longer strings

    Why? The correct hash you are comparing the user's input to is (obviously) of fixed length, the only information a user might get, is how long the program takes to hash his input (worst case, this might give some hints about the underlying hashing algorithm, which secrecy should not be relied upon anyways)

    The only exception is if you can't store the correct hash someplace. Having to calculate it first aka dealing with the password directly, brings the exact same password/-length issues up again.

  • Branch misprediction or cache misses

    Obviously, as mentioned above, the correct hash is always the same length, no matter how long the correct password of any given user is.

Seriously, this easy process makes a trivial problem from a very difficult one.

About Hash Strength

If you are using a weak hashing process (or one with ridiculous entropy), you could consider additionally checking for direct password equality (after a positive result from comparing hashes), to protect yourself against collisions.

This would however leak timing information/take longer, when encountering a collision.

Bottom line: Don't use weak hashing algorithms, apply some salt, and you should be fine! ;-)

Levite
  • 819
  • 1
  • 6
  • 14
1

Correct me if I'm wrong (in reply to Thomas, but also to generally answer the original question), but you should be able to strive towards leak free checks with your code. In this example, "known" is a known value, that has been pre-embedded into a buffer, I.e. If your known value is "qwerty", and you allow a maximum length of 64, then "qwerty" is prefilled (initialised and stored a single time at load time) into a buffer of 64, which ensures that memory loads should always be constant without giving anything away). In this instance we will only know if it is in cache or not by a cache miss. Replicating the code in the original post.

int check(char *known, size_t known_len, char *unknown, size_t unknown_len, size_t max_len) {

size_t i;
int result = 0;

  // Constant time check, only gives away maximum length.
  if (unknown_len > max_len)
      return 0;

  // Will only give away the length of the attackers string, unless it was already too large (condition above).  Don't bother doing an extra memcpy on your known or the attackers.
  for (i = 0; i < unknown_len; i++) {
      result |= known[i] ^ unknown[i];
  }

  return result == 0 ? 1 : 0;
}