19

There are many scientific publications that deal with cache attacks. Most recently, the CacheBleed attack was published which exploits cache bank conflicts on the Intel Sandy Bridge architecture. Most timing attacks use a similar approach:

  1. The attacker fills the cache with same random data he controls.
  2. The attacker waits while his victim is doing some computation (e.g. encryption).
  3. The attacker continues execution and measures the time to load each set of his data that he's written to cache in step 1. If the victim has accessed some cache sets, it will have evicted some of the attacker's lines, which the attacker observes as increased memory access latency for those lines.

By doing so the attacker can find out what cache set or even what cache line were accessed by the victim.

Most papers I've read automatically conclude that the attacker then has the possibility to deduce the data (e.g. a secure private key) that was written to these cache locations. This is what I don't understand:

If I know that victim V accessed a certain cache position, how do I get his data? Many memory addresses map to the same cache position and even if I had the full memory address, I doubt that the attacker could perform a DMA.

As a reference you can take the recently published CacheBleed attack.

Azeezah M
  • 53
  • 4
null
  • 525
  • 2
  • 13
  • 4
    The attacker does not deduce the private key written to those cache locations. Rather, *the victim is accessing locations depending on what the private key is*, so by figuring out which locations the victim is accessing, the attacker can work backwards and get the private key. – user253751 Jun 05 '16 at 08:57
  • 1
    But how does the attacker go backwards? As no direct mapped cache is in use on common architectures, many main memory locations point to the same cache line. There are thousands of possible main memory requests that result in the same cache set. It's like a 1-to-n relationship. I cannot simply go backwards, because there are n possibilities for what the private key is. – null Jun 05 '16 at 09:05
  • 1
    Yes, and the attacker knows what the victim code is doing, and so the attacker knows that there are only a few (8 I think) "interesting" locations it's going to access. – user253751 Jun 05 '16 at 09:09
  • Even if he knew what the code is doing, he does not know the memory layout. There are many security mechanisms such as ASLR that prevent to get an insight in memory allocations. – null Jun 05 '16 at 09:12
  • 1
    That paper is terrific. Sections 4 and 5 go into exquisite detail about how particular parts of numbers that must remain secret are extracted, and there is some discussion of what role those numbers play in sections 2 and 3, though the details are implementation dependent and are easy to lose in the weeds. Schneier's Crypto Engineering has excellent conceptual coverage of the math, sufficient to build an intuition. – Jonah Benton Aug 08 '16 at 04:31
  • 1
    The tldr is that in all implementations, complex calculations have to be optimized, with reuse of intermediate results. Those optimizations introduce additional structure, as particular patterns in the secret data lead to particular patterns in cache access. If cache access can be monitored at a granular enough level then secret data can be inferred. – Jonah Benton Aug 08 '16 at 04:34

2 Answers2

4

Apparently, this topic arouse great interest to many of you. After having done some more research, I presented the CacheBleed attack to a group of scientists last month. Now, I would like to share my results with you and to actually answer my own question.

The above three steps of the generic Prime+Probe attack are correct. However, the decisive step is missing, i.e. how to deduce the secret key. The attacker is not able to perform a DMA and he is not able to read the data saved in the cache lines. This is quite important to understand because otherwise, the entire attack would be much easier.

The attacker only knows what cache line was accessed by measuring time delays. Furthermore, he knows how RSA is implemented and what algorithms are in use. OpenSSL uses a fixed-window exponentiation algorithm to compute the message m

m = c^d mod p

We need to understand how this exponentiation algorithm works. The Handbook of Applied Cryptography by Menezes, van Oorschot, and Vanstone suggest the following pseudo-code:

Fixed Exponent Exponentiation Algorithm

Please do not confuse the secret key d and the exponent e in the above algorithm. As we are only interested in the decryption step, e is not the public key but the secret. Thus d = e in our case. The interesting point is the multiplication in 3.2. It uses precomputed multiplier g_j which actually speeds up the exponentiation. However, their selection depends on the secret key.

If the attacker knows the index of the current multiplier, i.e. what multiplier is used, he knows some bits of the secret key. The value of the multiplier is not of interest.

OpenSSL uses the so-called scatter-gather technique to avoid cache attacks on cache line granularity. It is predictable where the multiplier is stored. In total there are 32 multipliers. For that reason each needs 5 bits to be identified uniquely. The two most significant bits select the cache line while the three least significant bits identify the bin. Each cache line consists of eight bins.

The attacker is able to deduce what bin was accessed during a decryption operation. This reveals three bits of the index of the used multiplier and thus partly the private key. The missing two bits can be computed due to redundancies in RSA keys.

To sum it up, no DMA is performed, the attacker does not read data from the cache. The crucial factor is that the cache position partly reveals the secret key. This is due to secret-dependent memory accesses. Similar attacks such as on AES make use of the cache position as well. The actual data is not of interest, but the position reveals sensible data.

R1W
  • 1,617
  • 3
  • 15
  • 30
null
  • 525
  • 2
  • 13
0

This attack is similar to the Flush+Reload one, however yours uses cache filling instead of flushes to force cache eviction.

Refer to part 3 and 4 of this paper : https://eprint.iacr.org/2013/448.pdf

Basically, because of how modular exponentiation works, by knowing the time between each access in memory to the private key, you can deduce the key.

A long delay between two accesses means a multiplication which is a 1.

A short delay between two accesses means a binary shift which is a 0.

Then you add up all of them, and you have a pretty close representation of the key. Try multiple time and add some statistical magic, and you end up with the key.

If anyone want to further vulgarize the paper, feel free to edit this answer.

Nate
  • 411
  • 3
  • 9