1

What is the preferred or recommended way to pursue contestant time compares when array lengths are not equal?

Should we exit as early as possible? Something like:

if (array1.size() != array2.size()) {
    return NOT_EQUAL;
}

int accum = 0;
for(int i = 0; i < array1.size(); i++) {
    accum |= array1[i] ^ array2[i];
}

return (accum == 0) ? EQUAL : NOT_EQUAL;

Or should we avoid the early exit, and compare as much as possible in hopes of masking as much as possible? Something like:

int size = min(array1.size(), array2.size());
int accum = (array1.size() ^ array2.size());

for(int i = 0; i < size; i++) {
    accum |= array1[i] ^ array2[i];
}

return (accum == 0) ? EQUAL : NOT_EQUAL;

Or something else?

Or is this question more appropriate for the folks on the Crypto.SE because its theoretical minutia?


EDIT: this seems to be related to Timing attacks on password hashes. But the cited question is a particular instance of the generalized problem (and I'm interested in the generalized question).

I'm not sure I agree with the cited question's answer or assertion that "Using timing attacks in this case will in no way tell an attacker more than what he would know if he had the actual stored hash and salt..." because the attacker could be trying to recover the hashed password though timing attacks. Given a hashed password, its only a leap back to the password in some cases.

3 Answers3

1

Aborting early is fine, as it doesn't reveal timing for anything secret (unless you consider the length to be a secret, which doesn't seem plausible).

As for your edit, comparing password hashes in non-constant time is also fine. The attacker has no efficient mechanism to "choose" password hash outputs such that he can iterate through all possibilities for a byte at one position while holding the preceding bytes constant. So there is again no timing attack — not even to reveal the hashed password.

Stephen Touset
  • 5,736
  • 1
  • 23
  • 38
  • Do you have any citations for aborting early? Bernstein [talks about the various papers](http://www.ietf.org/mail-archive/web/cfrg/current/msg04331.html), but I have not been able to find something that spells out pursuit of constant time compares for disparately sized arrays. They seem predicated on equal size arrays. –  Jan 05 '15 at 01:17
  • 3
    No citation needed when reason will do. Aborting early provides exactly one bit of information to an attacker: whether or not the length of the two strings matched. If the length of either string is not considered secret (and I can't conceive of a cryptosystem in which it would be), then it is public and therefore of no additional use to an attacker. – Stephen Touset Jan 05 '15 at 01:30
  • 1
    @StephenTouset, knowing the length makes it tremendously easier to brute force the secret when the length isn't already known, quite literally by orders of magnitude. – atk Jan 05 '15 at 03:01
  • 2
    If a secret is short enough that knowing its length brings brute-forcing into the realm of possibility, your secret isn't long enough. And to correct you, if you know a secret is of length `n`, you *still* have to try all possibilities `2^{n-1}` and below, because there's a 50% chance the first bit is zero anyway. – Stephen Touset Jan 05 '15 at 07:03
1

If the data against which you are comparing is not a secret, then it doesn't really matter what you do. Timing attacks are used to learn information to which you do not already have access. If you already have access, then the attack is unnecessary.

If you are protecting a secret, then it doesn't matter if it is a password or not. You should handle it the same way, with the same precautions. In order to avoid timing side channels, you ensure that the input length will be the same as the stored value's length by hashing them both and doing a constant time comparison. Of course, you should store the hashed value of the stored value in the database, so that you aren't recomputing it every time. See pseudocode below.

Pseudocode for constant time comparisons:

bool compareToSecret(input/* input from user*/)
  dbHash = getSecretFromDb()

  inHash = hash(input); // sha256, bcrypt or whatever. Attacker 
                        // knows how long this takes, but it 
                        // doesn't get her anything

  // at this point, both inhash and dbhash are the same length.

  // the following loop is constant order time. Hashes are always the
  // same length, and all values of the hashes are always compared
  bool result = true
  int len = strlen(inHash)
  for( int i=0; i<len; i++ )
     result &= (inHash[i] == dbHash[i])

  return result

aborting early during comparison

Aborting early during the comparison is problematic, as it leaks info about the comparison and can be used to learn the password directly.

  1. Attacker tries secrets of length 1. One of these secrets will take a few milliseconds longer to return than the others, since there is one 1-length secret whose first character matches the stored secret. Attacker learns the first character of the secret.
  2. Attacker tries 2 all character secrets with first letter beginning with known correct value. One of these secrets will be a few milliseconds faster, since the second character compares correctly.
  3. Attacker repeats, separately attacking each character of the secret. Eventually, the attacker learns the stored secret.

This is quicker than a brute force attack, because the attacker can attack each character separately instead of the whole secret at once.

aborting early: before comparing

Aborting early, as mentioned by Steven Tousset, leaks the length of the secret. This doesn't look like a lot at first, but it lets the attacker speed up their attack by orders of magnitude. Let's say that the system accepts secrets of length 1-5. Keeping it very simple, the secret is composed only of digits. Thus, there are 10+100+1,000+10,000+100,000=111,110 possible values. A brute force attack needs to make about 100,000 tests to try every possible value.

If the system lets the attacker know the length of the secret, an attacker can dramatically reduce the number of values that s/he needs to test by first trying 5 values: '1', '12', '123', '1234', and '12345'. Now the attacker knows how long the secret is. If it's 1 character, s/he now only needs to make 10 tests instead of a hundred thousand. If it's 2 characters long, she only needs to make 100 tests. Without this key piece of information the attacker needs to make a lot more tests before they will find the secret.

another issue: recoverable storage

There's another issue, too. If the known secret can be compared against the input value, then you have to store the secret in recoverable form. Search this site for details related to passwords, but this is often considered a bad practice with storing passwords and should be avoided. In general - with any kind of secret - if you don't absolutely have to have the raw value, it should not be stored in recoverable form. Irrecoverable storage prevents several attacks - whether they matter for your secret depends upon how you use it.

one more thing: compiler optimization

The code, below, is pseudo code. If you were to pass its equvalent through a compiler or a runtime that does optimization, then you need to be very careful to ensure that the code does not get optimized. There are various tricks fir different languages, which are out of scope of this comment. Tha ks to @Polynomial for reminding of this

atk
  • 2,156
  • 14
  • 15
  • *"...Since you are asking about passwords..."* - actually, I'm more interested in the generalized question. The *cited* question asked about passwords, and I mentioned what I thought was not quite correct with the answer given in the cited question. Sorry about the confusion. –  Jan 05 '15 at 02:42
  • @jww Ah. i'll update my answer to address the general question. The answer is the same, however, sssuming that the data is secret. – atk Jan 05 '15 at 02:49
  • 1
    The code here is wrong. It will return true regardless of whether the items compared are equal. Your operator should be `&=`, not `|=`. – Polynomial Jun 30 '16 at 09:49
  • Actually, that would still be wrong, as the compiler may optimise the early-out anyway in aggressive cases. Instead you should xor each character pair and or that with the state. If the state ends up nonzero then the strings are not equal. – Polynomial Jun 30 '16 at 09:59
  • More importantly, it still contains branches which are conditional against whether the characters were equal, which can introduce cache-based timing attacks. To be clear, the correct approach is to maintain a state (initial value 0), bitwise-xor each character pair together, then bitwise-or that with the state. At the end, if the state is zero the strings were equal. For cases where the lengths are non-equal, the comparison should be performed up to the length of the shortest string. – Polynomial Jun 30 '16 at 10:11
  • @polynomial, good carch on th e operator, and you are correct that optimization needs to be prevented in certain security sensitive code. I don't see where you are seeing branching in the code that changes runtime. Could you provide a reference detailing exactly how this code would pdrform differently than how you suggest rewriting? Also, what makes you think that t g iz code does not compare one character at a time (unless I'm misunderstanding and that's not part of what you neant)? – atk Jun 30 '16 at 11:38
  • @atk It depends largely on what the compiler does, but an unoptimised compilation on x86 would be a cmp/jnz on each loop iteration due to the equality operator. As we're talking about turning optimisations off, this would be likely. An optimising compiler could also easy-out optimise the loop. A conditional move (cmov) would be preferred by an optimised compiler, but would still be vulnerable to differential power analysis attacks. Explicitly using xor between each character value eliminates any branching or conditional logic: it's always a linear bitwise operation on each loop iteration. – Polynomial Jul 11 '16 at 16:27
  • @atk Another issue is branch prediction. If your assembled code only contains the boundary condition on the loop, the branch predictor does no work: the instruction pipeline gets filled once with the loop body and no prediction occurs. If your code contains a conditional jump (i.e. if the `==` becomes a `cmp/jnz`) then the branch predictor has to guess which way that will go, and invalidate the instruction pipeline if it gets it wrong, which causes power and timing differences. The branch decision is based upon the data (i.e. are these chars equal?), hence the data-dependent branching issue. – Polynomial Jul 11 '16 at 16:33
  • @polynomial, feel free to edit the answer if you have solutions or improvements. – atk Jul 12 '16 at 03:01
0

Problem: You want to compare two sequences, a user-provided sequence U[] and a secret sequence S[] whose length you want to keep secret. You don't want to disclose, through the elapsed time of comparison, whether S is shorter than U or the length of the common prefix of U and S.

You can't abort the comparison after reaching the end of S because that will disclose the length of S. Nor can you abort after the first mismatch because that will disclose the length of the common prefix. What you can do is repeat the last element in S until you reach the end of U. For example, if U is "goodbye" and S is "hello", you want to compare as if it were "hellooo".

The following (untested) code in Python should illustrate the concept:

def slow_seq_eq(userseq, secretseq):
    """Compare sequences without disclosing secretseq's length."""
    Silast = len(secretseq) - 1
    is_equal = True
    for Uidx in range(len(userseq)):
        Sidx = min(Uidx, Silast)  # Repeat the last element of S to len(U)
        if secretseq[Sidx] != userseq[Uidx]:
            is_equal = False
    if len(U) != len(S):
        is_equal = False
    return is_equal

This of course assumes that the comparison operator != takes constant time, as does the indexing operator. Arranging for S to be placed at a certain alignment relative to cache lines or virtual memory pages could in theory keep the timing attack alive.

Damian Yerrick
  • 540
  • 3
  • 15