Why do CPUs operate speculatively with results of forbidden memory fetches?

Question

By my understanding, the Meltdown and Spectre attacks both exploit the fact that some modern processor, when given something like:

if (x < arr1[y])
  z = arr2[arr3[x]*256];

may sometimes fetch the value of arr2[arr3[x] * 256] before they have determined whether x is less than arr1[y], and may do this without regard for whether code has adequate permission to access arr3[x]. The portion in italics is the sine qua non of the vulnerability: if the processor were to only fetch arr2[arr3[x] * 256] in cases where the access to arr3[x] was permissible, it would be impossible to exploit an to arr2[___ * 256] using illegitimately-fetched arr3[x] values because there wouldn't be any.

It makes sense that even if arr3[x] is invalid, the processor can't trap unless or until it determines that x is less than arr1[y]. What I fail to understand is why a speculative fetch from an invalid address shouldn't cause the CPU to abandon the current speculative execution path? I would think that in almost every realistic scenario one of two things would happen:

The branch prediction that led to speculative execution turns out to be wrong, in which case any work that might be done with the speculatively-fetched value will need to be discarded.
The branch prediction that led to speculative execution turns out to be correct, in which case execution should trap at the invalid access, and work that might be done with the speculatively-fetched value (before executing the trap) will need to be discarded.

Is there any realistic scenario in which speculative work that follows a speculative fetch from an illegitimate address could turn out to be useful? If not, what advantage is there to allowing speculative execution to continue past such fetches? If an invalid fetch will make the CPU abandon the current line of speculative execution, that would avoid the need to keep track of speculative pending traps.

I guess the people who invented branch prediction thought in terms of "performance" and not in terms of "security". For shure branch prediction is not always useful, but it is in the average case and thats what counts concerning performance. — Jonas Wilms, Jan 05 '18 at 11:35
@JonasW.: What I fail to understand is why allowing an illegal speculative fetch to succeed and resolving later whether it should have been allowed to do so, would have had any advantage over having any "dodgy" actions block speculative execution until they are resolved. Among other things, as a matter of principle, if a core fetches information it shouldn't, it will be necessary to inspect all unspecified actions of behavior to ensure that the information can't leak. By contrast, no care will be needed to prevent a core from leaking information it never acquires in the first place. — supercat, Jan 05 '18 at 15:46
@supercat It's a matter of threat models. The chip designers *proved* that you can't access the data through proper channels. What they missed is a side channel. If one looks at the history of side channel attacks, it's clear that it is not the first time someone failed to pay attention to a timing issue, nor will it be the last. — Cort Ammon, May 23 '18 at 04:28
@CortAmmon: My question had been what *advantage* there was to the design, since checking accesses before the speculative value fetch would seem easier than checking after. I've since figured out some ways where adding it might allow reduced latency at the expense of considerable complexity, and guess Intel decided the increased complexity was worth the reduction in latency. — supercat, May 23 '18 at 14:55
@supercat Yeah, latency is the key. Checking first and fetching after would be easier, but checking and fetching *at the same time*, in parallel, is faster. — Cort Ammon, May 23 '18 at 16:53
@CortAmmon: The time required for the processor to discover with certainty the address from which data should be fetched is essentially the same as the time required to find out whether the access is legal, which is why I had been confused about the reason for this behavior. The situation where it makes a difference is when the processor doesn't know what the address will be, but may be able to guess (e.g. when switching page tables, if entries are marked as "possibly invalid" rather than flushed, an access to an address that was valid on the earlier page table... — supercat, May 23 '18 at 17:27
...might be satisfied using a cached value at a cached address in less time than it takes to fetch a page-table entry from main memory. A possible fix (which could also help performance in some cases) might be to say that speculative fetches may only be honored by things in L0 cache, since bus bandwidth usage to fetch an object that isn't in L0 cache would compete with operations to find out whether the object was actually needed. — supercat, May 23 '18 at 17:30
@supercat How can determining the access is legal take the same time as discovering the address to fetch, when the latter is a prerequisite for starting the former? — Cort Ammon, May 23 '18 at 18:34
@CortAmmon: The mapping of accessible logical addresses to physical address is controlled by the same page table that indicates which addresses are accessible. The key situation occurs when code asks for a particular logical address (e.g. 0x12345123) and the system hasn't fetched the page-table entry for that address since the last time a new page table was selected, but knows that the last time code that logical address was used it mapped to e.g. 0x98765123. Any operation involving that address must be speculative until the actual page-table entry is fetched. — supercat, May 23 '18 at 18:57
@supercat Your logic was sound, so had to go look at the [spectre paper](https://spectreattack.com/spectre.pdf). "While simple delays could theoretically work, they would need to be very long since specula- tive execution routinely stretches nearly 200 instructions ahead of a cache miss, and much greater distances may occur." It looks like that's why they went down that path. — Cort Ammon, May 24 '18 at 05:33

score 3 · Answer 1 · answered Jan 05 '18 at 00:52

3

Consider the following code:

mprotect(arr3, sizeof(arr3), PROT_NONE);
...
mprotect(arr3, sizeof(arr3), PROT_READ);
z = arr2[arr3[x]*256];

At some point, arr3 becomes readable by the process. If the assignment (well, actually the dereference of arr3) has already been loaded before the mprotect completes, what behavior would you expect? I would not expect a trap, since it's valid at the point.

Basically, you don't know what the state of the memory at arr3 is until the instruction before it has been retired, at which point the pipeline will have executed well past that point.

I suppose you could "speculatively trap" and re-execute if the memory permissions change in the mean time, but apparently that's not what is done today. (Though I imagine it would be a much smaller performance hit than the KPTI software fixes.)

answered Jan 05 '18 at 00:52

David

15,814
3
48
73

2

Nothing that happens while code is being speculatively executed is allowed to cause a trap if the code isn't "actually" executed. On the other hand, anything is allowed to cause a program to discard the results of speculatively executing a stretch of code and refraining from further speculative execution until all preceding branches are resolved. In your indicated example, I would expect that the processor should wait until it non-speculatively executes the second `mprotect` before it allows the execution of anything that might be affected thereby. – supercat Jan 05 '18 at 01:43
1

If a function to disable read access to a stretch of memory were within an `if` block, and the processor decided to speculatively execute a read which was after it, that would seem justifiable in cases where the speculative read succeeded, even if executing the function would cause the read to be forbidden. From what I understand, though, x86 processors are continuing speculative execution past forbidden reads. – supercat Jan 05 '18 at 01:57
As a point of clarification to my last point, quality languages should include a means of designating functions which, if called in any part of an `if`/`else` or `switch`, would block speculative execution of any code following the `if` or `switch`. Code like `enableMemoryUserShouldntAccess(); ... if (foo) disableMemoryUserShouldntAccess(); return *untrustworthyPointer;` might be exploitable if a neither the programmer nor compiler adds such a barrier, and existing code using such constructs without a barrier may be problematic, but only those functions that would be allowed... – supercat Jan 05 '18 at 18:01
...to access things their callers cannot would pose risks (as compared with all functions everywhere being able to access everything). – supercat Jan 05 '18 at 18:03

score 0 · Answer 2 · answered Jan 07 '18 at 20:02

Obviously it's a bad idea to do anything with the result of a speculative load of a memory location that the code isn't allowed to read. But you can't just forbid using data from a memory location that you weren't allowed to read, because often you don't know (yet). You'd have to forbid using data from a memory location where you haven't figured out yet if you are allowed to read it or not.

Unfortunately, data from other processes can actually be in your L1 cache. So if you read from L1 cache (which is very fast), the processor doesn't know yet if it was allowed to access the data or not. So you'd have to wait on basically every memory access until permissions are checked, which would slow down things considerably. Well, at least with the way Intel designed their processors. Since AMD processors don't run at a snail's speed, obviously they are doing something different.

One method (requiring a substantial change in the processor) would be to make sure that although data from another process can be in your L1 cache, such data will always produce a cache miss (if you switch processes, then the same data would magically not produce cache misses anymore). That way, you could safely use data from L1 cache speculatively, and if it's not in L1 cache then you just might have enough time to check permissions before the data arrives.

Another method would be to stop speculative execution when something happens that could cause detectable changes. For example, after a speculative read you could continue until you encounter an operation that would change the cache in any way.

I guess I'm surprised that a CPU would allow an L1 cache to retain data from other processes, even speculatively, without including enough data in the tags to determine that the data is definitely accessible (occasional false negatives would not be a problem). A big part of the purpose of the L1 cache is not only to supply data sooner, but also to reduce bandwidth on higher-level memory resources, and the value of caching data would seem to be greatly diminished if permissions still had to be externally checked, even if latency wasn't an issue. — supercat, Jan 08 '18 at 16:07
My VLSI design and architecture education was in 1994, and obviously things have changed a lot since then, but I would have thought that the resources necessary to allow data validity checks to be pipelined separately from the data fetches themselves would have been greater than those necessary to cache reliable validity information. — supercat, Jan 08 '18 at 16:11
Is the idea that the processor speculatively retires a load based on L1 cache data while it waits for real confirmed data, compares actual fetched data with the data that was speculatively processed, and rewinds to the point of speculative fetch if they mismatch? So the scenario where data inaccessible would just be a variation on the more general case of data not matching expectations? If so, perhaps it would be good for the answer to say that. — supercat, Jan 08 '18 at 19:14

Why do CPUs operate speculatively with results of forbidden memory fetches?

2 Answers2

Linked

Related