18

Safety numbers in Signal are derived from a hash of the conversation's users public keys and their phone numbers. Safety number are used to ensure that the conversation was not MITM-ed.

When deriving safety numbers, SHA-512 iterated for 5200 times. According to the Signal safety blog, there were privacy issues re:phone numbers embedded in hashes. However this cannot be the reason, given the set of possible phone numbers is relatively small.

Comments in the source code:

The higher the iteration count, the higher the security level:

  • 1024 ~ 109.7 bits
  • 1400 > 110 bits
  • 5200 > 112 bits

So: what is the reason for intentionally slowing down the Safety Numbers computation?

Bonus: how are roughly the security levels (1024 SHA-512 hashes ~ 109.7 bits) computed?

Nayuki
  • 222
  • 2
  • 7
bgd223
  • 353
  • 2
  • 6

3 Answers3

16

The comment isn't explained very well, but I believe I've determined the math behind it. The safety number is 60 base 10 digits, but it's created in two 30 digit halves: one based on your phone number and public key, and one based on the phone number and public key of the person you're talking to.

Assuming a high entropy value is converted into a 30 digit number without unnecessary loss of entropy, it will contain log2(1030) ≈ 99.66 bits of entropy, equating to 99.66 "bits of security" (meaning an attacker would have a 50% chance of matching that safety number after 299.66/2 = 298.66 hashes). Iterating many times increases the bits of security (since it increases the number of hash operations for each try the attacker makes):

log2(1030 × 1024) ≈ 109.66

log2(1030 × 1400) ≈ 110.11

log2(1030 × 5200) ≈ 112.00

This is for how many hashes an attacker would have to perform to match a specific security number, but if the attacker wanted to be able to read the messages you send them, they'd need to know the private key that matches the public key they used in the hash. Generating RSA keypairs is computationally expensive, but ECC is much faster. If the keypair generation is fast enough, it makes sense to iterate the hash to increase the lower bound of an attack on a safety number.

forest
  • 64,616
  • 20
  • 206
  • 257
AndrolGenhald
  • 15,436
  • 5
  • 45
  • 50
6

The safety number is a derivation of the stable identifier and the public key of a user. Safety numbers are computed for both people in a conversation.

The real important code is this snipit

byte[] publicKey = getLogicalKeyBytes(unsortedIdentityKeys);  
byte[] hash = ByteUtil.combine(ByteUtil.shortToByteArray(FINGERPRINT_VERSION), publicKey, stableIdentifier.getBytes());  

for (int i=0;i<iterations;i++) {
        digest.update(hash);
        hash = digest.digest(publicKey);
      }

What's happening in is we are taking the fingerprint version, public key, and stable identifier as starting inputs and hashing that once with SHA-512. The second iteration apends the public key to the hash we just produced, then hashes it a second time.

This process of adding the public key and repeating the hash continues for the number of indicated iterations.

Why do we need to do more iterations than the past?

This is due to a fundamental issue if hashing. Which is the possibility of hash collisions.

Let's say I'm an attacker (Eve). Alice wants to talk to Bob, so Signal sends her public key to Bob, but Eve intercepts the public key and substitutes her own. Normally there is an indication the key changed, and the Safety Number changes.

IF Eve had enough resources she could construct a public key which matched the safety number. To combat this threat we make it so that Eve would need to find a collision which occurs after 5200 rounds of hashing, with adding that same key every round.

This becomes computationally infeasible since each round of hashing makes finding a collision linearly more computationally expensive. The number of iterations currently picked usually is calculated on how long an attack of this style would take based in resources of the percieved threat.

I can't find any calculations from Signal as to specifically why they picked 5200.

Daisetsu
  • 5,110
  • 1
  • 14
  • 24
  • 1
    `log2(10^30 * 5192) = 111.999918` and `log2(10^30 * 5193) = 112,0001954`. 5200 was probably taken to have a "smooth" number to calculate 112 bits of entropy. – Tom K. Oct 16 '18 at 11:30
  • hello sir, anywhere i can contact you @Daisetsu – turmuka Dec 15 '18 at 08:09
  • @turmuka do you have a comment about this particular answer? – Daisetsu Dec 16 '18 at 10:45
  • @Daisetsu: Your statement about collisions is **nonsense**. The probability of collisions depends only on the length of the hash. It has nothing to do with the number of iterations. Increasing the number of iterations has the purpose to increase resources needed to brute-force it (CPU, RAM) and thus to make brute-forcing practically impossible. – mentallurg Mar 02 '20 at 00:44
  • @mentallurg Your reply feels a bit agressive, I don't know if that was your intention. Ultimately getting a hash collision for the current safety number does nothing because the attacker doesn't have the keys to decrypt/sign/encrypt any of the messages. They would simply have a matching safety number, but not the keys needed to read/send anything. – Daisetsu Mar 02 '20 at 01:05
  • @Daisetsu: Sorry if it feels so. That was not my intention. I just wanted that readers of this thread better understand what is happening by doing 5200 iterations. The probability of collisions in such algorithm **does not change**. – mentallurg Mar 02 '20 at 01:16
  • @mentallurg If you feel an answer should be changed, every user can edit the answer and submit it to be updated. The update will be approved by other mod users. That way users don't have to go to the comments. – Daisetsu Mar 02 '20 at 01:20
  • @Daisetsu: Basically yes. But it makes sense only in case small corrections are needed. Where in this case the whole point of the answer is wrong. The 1st part is neutral; "wrong" or "right" is not applicable here because it does not answer the question, it just explains what is going on. This could help to understand the answer. But the whole point of the answer is, that this reduces possibility of hash collisions, which is *wrong*. Actually the proper approach would be if you would delete the answer. If you want to continue discussion, please open a **chat**. – mentallurg Mar 02 '20 at 01:30
  • @TomK. On my app I get `112.002 138 754 537` for $log_2 (5200 * 10^3)$. – U. Windl May 30 '22 at 09:23
2

These comments in the source code are wrong. Developer did not really understand what he wrote.

Here is his comment:

The higher the iteration count, the higher the security level

This statement is partially correct. Namely, it requires more resources for brute-forcing and makes it from this point of view more secure. But the probability of hash collisions depends solely on the size of the hash space, i.e. on the length of the hash (assuming that hashes are distributed equally). It does not depend on the number of iterations. Means, from the point of view of hash collisions it is not more secure.

A simple example. Suppose the hash consists of a single byte, i.e. 8 bits. In the reality nobody would use it because it is not secure. But lets us to easier understand what is going on. 8 bits means there are 256 different hashes. No matter how many iterations you do, 5200 or 1000000, you get in the end one of 256 hashes. What is the probability to get one of 256 values? It is 1/256.

Then why is smb. talking about 5200 iterations -> 112 bits?

Again, let's take first a hash of 8 bits = 256 different values. How many messages you need to try to get one that produces a given hash? In the worst case you need 256 calculations. Suppose now we use a hash function that uses 2 iterations of the original hash function. To brute-force it you need in the worst case again 256 iterations, each of them is x2 longer than the original. Means you need time like 512 original hashes, which is 2^9. This is equivalent to brute-forcing 2^9 values with the original hash function. If we use 8 iteration, the time needed is 256 x 8 = 2^11, it is like using the original hash function for 2^11 values, i.e. like increasing the hash length from 8 to 11 bits. For 5200 iterations it is like increasing the hash in log2(5200) ~= 12 bits.

In the OP the entropy is 99.7 bits. Brute-forcing a hash with 5200 iterations takes approximately the same time as brute-forcing a hash with 1 iteration but which is ~12 bits longer. log2(5200) ~=12.3; 99.7 + 12.3 = 112.

Once again: It is not correct to say that increasing the number of iterations equivalent is to the increasing of the hash size, or that it increases entropy. It only increases the needed CPU time. But the entropy remains the same, the probability of hash collisions remains the same.

mentallurg
  • 8,536
  • 4
  • 26
  • 41
  • okay but so long as the input entropy is smaller than the resulting digest (and with SHA-512, 30 digits are certainly less than 512 bits), that's the only thing that counts in a secure hashing algorithm. I don't think it's incorrect to say that the security level changes. It would be incorrect to say that "the probability of hash collisions is reduced due to strengthening the hashing algorithm" or something, but that's not what is being claimed. – Luc Mar 02 '20 at 08:48
  • @Luc: "but that's not what is being claimed" - unfortunately no. The answer of *Daisetsu* states namely that (and contradicts to your and to my views). He is saying namely *Which is the possibility of hash collisions.* He basically claims that increasing the number of iterations reduces the probability of hash collisions. Which is not true. To security level: The number of iterations changes only *one* security aspect, computational complexity. The other security aspects remain *unchanged*, e.g. the probability of collisions. – mentallurg Mar 02 '20 at 18:24
  • Oh, you're replying to another answer, okay. That wasn't mentioned so I didn't know that. Perhaps you want to clarify in the answer that part of your answer is in response to Daisetsu's claim regarding collision probability? – Luc Mar 02 '20 at 19:05
  • @Luc: First I commented the answer of Daisetsu's :) But it was not sufficient. That's why I wrote a new answer. – mentallurg Mar 03 '20 at 02:06
  • But if the input space (8 bits) is much smaller than the hash output size (512 bits), wouldn't it be easier to pre-compute the 5200 iterations in a table of just 256 (2^8) entries (each using 512 bits)? In that case increasing the iterations wouldn't increase the strength much, right? Also for a small input space the probability of hash collisions becomes unimportant; is that right too (If there were hash collisions, the hash function would be very bad IMHO)? – U. Windl May 30 '22 at 09:31
  • @U.Windl. Sure. And if the message space is bigger, but still relatively small like 30 - 40 bits, rainbow tables can be used to find hash collisions. It makes sense to consider iterations only starting from some message size. – mentallurg May 30 '22 at 15:13