4

I have a secret key k and message m. I send m || HMAC(m, k) to some other party, who can verify the integrity of m, assuming that they know k.

Suppose I have multiple messages m1, m2, and so on. Which of the following two constructions is better from an overall security perspective:

m1 || HMAC(m1, k) || m2 || HMAC(m1 || m2, k) || ...

or

m1 || HMAC(m1, k) || m2 || HMAC(m2, k) || ...

Obviously, I will not need to actually concatenate m1 || m2 to generate HMAC(m1 || m2, k). I simply continue using the state of the HMAC as more and more messages are signed. So the first mode of operation is actually more efficient and what I would prefer to use. In the second case, I have to reset HMAC state prior to the next message.

As far as the receiver is concerned, the first time message authentication fails, that's the end of the conversation. So I don't gain anything by having a valid HMAC for m2 if m1 could not be authenticated.

Is one of the constructions inherently more secure than the other? Could the first mode of operation leak any information about the key (theoretically)?

VokinLoksar
  • 165
  • 1
  • 6

2 Answers2

4

It really depends on what security feature you want to achieve. Each pair "m,HMAC(k, m)" provides integrity in the following sense: the adversary will not be able to forge a pair declared valid by the receiver, that the sender did not compute in the first place.

You may want additional properties, though. For instance, you have a sequence of messages, and you want to prevent the adversary from altering it, i.e. dropping messages altogether, or inserting duplicates of old ones, or reordering them. This is a valid concern, and it is usually addressed with a sequence number. That is, each message is sent as "m,HMAC(k, s || m)" where s is a fixed-sized sequence number (e.g. encoded over 64 bits), starting at 0 for the first message, and then incremented at each message (with a 64-bit sequence number, you have quite some room -- it won't cycle back to 0 until "a long time"). That's how it is done in SSL, for instance (each SSL "record" has its own MAC and the MAC computation includes the sequence number). Note that the sequence number is implicit: it is used in the MAC computation, but needs not be transmitted over the wire, so it does not increase the size of what has to be transmitted.

Other constructions are possible, such as using the previous HMAC value (you send "m,HMAC(k, z || m)" where z is the HMAC output sent with the previous message). Your proposal also works, but only because the MAC is really HMAC, which is robust and which allows to grab its internal state relatively easily (although some implementations might not make it easy). Indeed, many MAC algorithms require a non-repeating IV and would not achieve proper security with your method. This is why I recommend that you use sequence numbers, since this is a standard, well-studied method which has the added benefit of being more easily applied to other MAC algorithms (and algorithm agility is a nice feature to have).

(The ability of HMAC to operate in an IV-less fashion is the feature which makes it the best generic MAC algorithm. If you have an IV nonetheless, because your messages are not only MACed but also encrypted, then authenticated encryption algorithms are preferred.)

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • My messages are indeed encrypted and I'm using Encrypt-then-MAC to achieve authenticated encryption. I don't have access to any modes of operation that provide their own authentication. Prior to `m1` I'm also writing some additional metadata to the HMAC, which is used to identify the exact encryption scheme being used. To avoid doing this operation prior to each message, I wanted to continue using the same HMAC instance for each new message that is sent. – VokinLoksar Oct 06 '12 at 18:43
  • 1
    Assuming the same key may be used for multiple message sequences, a sequence number may not be enough. It only ensures the order within *a* sequence, not the order within *this particular* sequence. `m ∥ HMAC(k, z ∥ m)` is probably a better choice in that scenario. – Stephen Touset Oct 07 '12 at 07:32
3

The approach you're asking about is also known as a length extension attack, and HMACs are explicitly not susceptible to these kinds of attacks.

Given HMAC(K, m) = H((K ⊕ opad) ∥ H((K ⊕ ipad) ∥ m)), the best you can do is precompute

Ko  = K ⊕ opad
Ki  = K ⊕ ipad
m'0 = Ki ∥ m0
m'i = m'i-1 ∥ mi

HMAC(K, mi) = H(Ko ∥ H(m'i))

which doesn't actually save you anything except a few XORs and appends. You're still having to compute two hashes per message, which is the computationally expensive part. On top of that, you're having to do it on constantly increasing amounts of data. MAC verification is also likewise impacted.

One thing I've also noticed in your description is that you're simply concatenating successive messages and their MACs. Unless those message lengths are fixed, you should be doing something like

LEN(m0) ∥ m0 ∥ HMAC(K, m0) ∥ LEN(m1) ∥ m1 ∥ HMAC(K, m1) ∥ ...

where the length is stored with an unambiguous, fixed-size encoding (e.g., unsigned 64-bit integer in big-endian order).

Update: "Better from an overall security perspective" is entirely dependent on the needs of your system. If you require each successive message to be authenticated by the message before it, then HMAC(H, mi) is unsuitable. If each message can be safely verified independently of any other message, then there is no reason to use anything more complicated than HMAC(H, mi).

Stephen Touset
  • 5,736
  • 1
  • 23
  • 38
  • "having to do it on constantly increasing amounts of data" - VokinLoksar's point is that you can re-use the hash state. Is that not the case? – sourcejedi Oct 06 '12 at 16:34
  • I have demonstrated that you can not reuse any meaningful HMAC state. At least not with the standard HMAC construct. – Stephen Touset Oct 06 '12 at 16:57
  • Sorry, I had difficulty seeing your point. Ok, I can see that's what you're asserting now. I still can't follow your logic though. H(m'i) _can_ be computed more cheaply if you've already computed H(m'i-1). It doesn't matter that HMAC is immune to length-extension, because H is not a HMAC. I've also checked the python library for HMAC - the documentation shows it matching the general hash library API, which supports extension, and claims this is implemented efficiently. http://docs.python.org/library/hmac.html – sourcejedi Oct 06 '12 at 17:20
  • Oh, and it's implemented in python too. So there's a link to the source code as well, which does what I would expect. – sourcejedi Oct 06 '12 at 17:23
  • The python example is a good one. I'm basically asking if there are any security implications in creating a single HMAC object, and then alternating between update() and digest() calls. The alternative being discarding the object after each digest() call and creating a new one for the following message. In my case, the state also includes some metadata that is HMACed just before the first message, so reusing the object would save me a few extra writes. – VokinLoksar Oct 06 '12 at 18:36
  • @sourcejedi Computing `H(m'_i)` from `H(m'_i-1)` takes *exactly* the same amount of computation as simply computing `H(m_i)`. Again, nothing is gained performance-wise. You still must compute two hashes. – Stephen Touset Oct 07 '12 at 07:20
  • Sorry, I see the issue here. Nothing is gained performance-wise versus simply computing each message signature *independently* of one another. But you can certainly improve the performance of computing a chained HMAC of an entire message sequence. That said, it's more straightforward and sane to use one of Thomas' suggestions than to try and be clever by reusing hash state. Cleverness is rarely a good idea in cryptography. – Stephen Touset Oct 07 '12 at 07:36