what fails in speculative execution that allows the read of memory out of bounds in Spectre vulnerability vs. normal CPU behaviour?

Question

Following google's Project zero blog entry for Spectre/Meltdown, there's this piece of code that exemplifies the attack:

struct array {
    unsigned long length;
    unsigned char data[];
};
struct array *arr1 = ...; /* small array */
struct array *arr2 = ...; /* array of size 0x400 */
/* >0x400 (OUT OF BOUNDS!) */
unsigned long untrusted_offset_from_caller = ...;
if (untrusted_offset_from_caller < arr1->length) {
    unsigned char value = arr1->data[untrusted_offset_from_caller];
    unsigned long index2 = ((value&1)*0x100)+0x200;
    if (index2 < arr2->length) {
        unsigned char value2 = arr2->data[index2];
    }
}

It is explained that the speculative execution followed by the CPU, will try to execute until

arr2->data[index2];

before reaching the condition

if (untrusted_offset_from_caller < arr1->length) {

that would prevent the access to a zone out memory of bounds.

My question is:

what would have prevented access to that zone of memory in a normal execution if the code would have try to do so explicitly?

I suppose in some place the OS and/or cpu memory access checks should stop that and that the speculative exec simply jumps over that (?).

It seems that patching that (not actually made) check (if my previous guess is correct...) would not suffice or is not the correct approach, as it has already been stated that two other conditions are needed: flushing the branch predictor & take the full address of the branch instruction into account (it seems that CPUs don't do that nowadays), but:

wouldn't the checks (if the answer to the first question is that additional checks are needed) or that 'flushing' impact in performance? and if so, has it been estimated how much? (when those supposed CPUs would/will be made).

The check `if(untrusted_offset_from_caller < arr1->length) {` prevents it because it doesn't execute the code in the `if` statement because the check is false. — user253751, Jan 10 '18 at 01:58
Well nothing. This version of the Spectre vulnerability only allows you to read data that the processor determines you have access to. (Without also involving Meltdown) — user253751, Jan 10 '18 at 22:44
Because just because the processor doesn't know you're not meant to have access to something doesn't mean you're meant to have access to it. Spectre breaks many access control mechanisms other than the slow one provided by the processor (and Meltdown breaks that one too). For example, you can no longer write `if(program should have access to data) {fetch the data}` in a virtual machine. — user253751, Jan 12 '18 at 01:17
thanks @immibis for your sharp comments. Your first sentence is quite a tongue twister, but I get what you mean :-) I have to study this deeper... You're right in that this attack is for same process' memory leaking - is the other Spectre attack the one that leaks memory from other processes: thanks for showing that, I didn't saw it at first. About your `VM code`: is this in the line of last @CortAmmon answer? so that now any `if` means possible data exposure via side-channel attacks? — circulosmeos, Jan 12 '18 at 19:26
@circulousmeos Well it depends what's in the `if` but essentially yes. When writing security-critical code (which is actually lots of code!) you cannot trust `if`. Obviously that is a huge problem because `if` is a very basic statement. — user253751, Jan 14 '18 at 09:10

score 4 · Answer 1 · answered Jan 06 '18 at 00:40

Simplifying things a little, the problem is that something like

if (*p1) x = p2[256 * *p3];

may be processed as:

start loading *p1 into t1.
load *p3 into t2, and set t3 to 0 if it's a valid fetch, 1 otherwise.
load *(p2 + t2*256) into t4
wait for t1..t4 to be ready
if t1 was set, then...
  if t3 is set [access was invalid] then fire an invalid access trap.
  otherwise copy t4 into x.
discard t1..t4.

If reading *p1 yields a zero value, the fact that *p3 is invalid must not cause a trap (since code wouldn't actually ask to read *p3). For whatever reason the designers at Intel thought it was easier to delay checking the validity of the first memory read until after the fetched value was used to compute the address of another speculative read, than to have an invalid memory read immediately force the speculator to assume its prediction was wrong.

Note that the problem isn't that the processor speculatively fetches from *p3. The problem is that the processor makes use of that value without regard for whether it was legitimately acquired. While present attacks focus on using the fetched value to compute an address, and then using the cache to find out what address is fetched, the fundamental problem is that the data gets read and latched without regard for whether the access is legitimate. Any time a device physically fetches data that should be inaccessible will create a potential for side-channel attacks. The best way to prevent such attacks is to avoid having the device acquire such data in the first place.

Thanks, @supercat : "...processor makes use of that value without regard for whether it was legitimately acquired" That's the part I'm interested in... what steps takes the CPU in a normal scenario to guarantee legitimate memory access? — circulosmeos, Jan 06 '18 at 18:47

score 4 · Answer 2 · answered Jan 12 '18 at 19:06

The key concept is that no chip is obliged to precisely execute code according to the instructions. The obligation is instead that it must execute "as if" the code was precisely executed according to the instructions. All modern processors take advantage of this freedom.

In theory, the processor is free to execute any instruction speculatively at any time, as long as the end result is "as if" the instruction was never executed at all. This is used to increase performance, using idle parts of the chip to do speculative work, in the hopes that the effects of this instructions are needed.

In the Meltdown/Spectre bugs, it was revealed that such speculative execution is actually NOT "as if" it never happened. This speculation changed the state of the cache, loading data that might not otherwise be cached. This changes timings which Meltdown/Spectre leverage to read memory from places it was not supposed to read.

The key failure is that the chip is no longer operating "as if" it was following orders perfectly. It's marching slightly out of line. In a case like:

if (cursorIdx < cursors.size())
    y = buffer[cursors[cursorIdx]];

a programmer may expect that no memory from the buffer can be read. This is logical because the instructions written into the program demanded that the cursor array size check occur before the buffer read. As long as the chip operates "as if" it was following orders, you can prove that nobody should be able to observe a buffer read occurring before the size check. With this exploit, we see that an observer indeed can read from the buffer, and maybe even other memory outside of the bugger.

Thus, code which appeared to be safe, even against side-channel attacks, is suddenly very unsafe because it is no longer operating "as if" it was executed properly. The issue is not that the memory is being read speculatively. The processor was allowed to do that the whole time, and is still technically allowed to so. The issue is that this exploit demonstrates that such speculative reads are not "as if" they were never read, because we can observe whether they occurred. They now leak information about the memory-space which were not supposed to be leakable.

And yes, to answer your question, the fixes to this do indeed negatively impact performance. One of the reasons this exploit is such a big deal is that it is very difficult to resolve these fixes without performance impacts.

You're very firm in a pair of statements that may help me to clarify the question (together with @immibis sharp comments) : _"The key concept is that no chip is obliged to precisely execute code according to the instructions."_ & _" The issue is not that the memory is being read speculatively. The processor was allowed to do that the whole time, and is still technically allowed to so."_ It is not that they weren't unknown to me, & it's true that CPUs operate that way, but seeing them write in quite a bold way is at first shocking :-) In fact the 2nd probably must be revised for future CPUs... — circulosmeos, Jan 12 '18 at 22:47
@circulosmeos Future CPUs will still be permitted to read speculatively, but the requirements they must meet in order to retain the "as if" behavior have *certainly* been increased. Future CPUs that seek to do speculative reads are going to have to take additional steps to ensure nobody can observe the fact that they did the speculative read. For example, I'm certain there's an Intel designer looking right now at whether the next generation CPUs can un-evict cache lines if a speculative read falls though to get them one step closer to "as if." — Cort Ammon, Jan 12 '18 at 22:58

score 3 · Answer 3 · answered Jan 06 '18 at 00:27

Starting in the late 1970s/early 1980s, CPUs started breaking down execution of instructions into a series of smaller steps, called "pipelined execution". For example, the CPU might be simultaneously reading one instruction from memory, executing a second, and storing the results from a third. Doing things this way makes for faster CPUs: by having multiple instructions in different stages of processing at the same time, a CPU can be many times faster for the same clock speed.

This runs into a problem when the instruction being executed is a branch (such as an "if" statement): which instruction is the "next" instruction that should be loaded from memory? The solution was branch prediction and speculative execution: the CPU makes an educated guess at which instruction is next, runs it, but doesn't make the results permanent until it knows if the guess was right or not. This speeds things up again, because the CPU doesn't need to wait for the results of a branch to be known unless it guesses wrong about which way the branch will go.

Modern CPUs are fast, memory access is slow, and pipelines are much longer than the simple three-stage pipe I described above. Looking at your code from a CPU's point of view while it's executing this line:

unsigned long untrusted_offset_from_caller = ...;

The CPU sees the two "if" statements coming up, and goes: "based on past experience, both these 'if' statements will turn out to be true. Therefore, I'll need to fetch arr1->data[untrusted_offset_from_caller] and arr2->data[index2] from memory."

if (untrusted_offset_from_caller < arr1->length) {
    unsigned char value = arr1->data[untrusted_offset_from_caller];
    unsigned long index2 = ((value&1)*0x100)+0x200;
    if (index2 < arr2->length) {
        unsigned char value2 = arr2->data[index2];
    }
}

It then proceeds to issue the memory requests and speculatively execute the code. Now, this time, it turns out that if (index2 < arr2->length) was false, and the CPU discards the work it did.

However, it doesn't discard all of it. Memory, registers, and the instruction pointer all show what you'd expect from the statement being false, but the preemptive fetch of arr2->data[index2] is still in the CPU's data cache. A program can figure this out by the fact that reading that part of memory is faster than normal, and can deduce what the value of arr1->data[untrusted_offset_from_caller] was.

Thanks, @Mark . I understand the Spectre attack as such. What I ask is what would have prevent access to that memory address if the processor would have intended to do it directly, in order to understand what is failing (if anything) in speculative exec. vs normal exec., or can be patched in the future architectures. — circulosmeos, Jan 06 '18 at 00:40
The problem isn't that arr2->data[index2] is in the cache, because there is no way to access it. The problem is that the previous contents of the cache line has been ejected, so trying to read that previous contents again takes longer than it should, an attacker can determine which cache line was ejected, and this let's us figure out "index2" and therefore "value". — gnasher729, Jan 07 '18 at 00:53
@gnasher729, there are a number of different ways to see what's in the cache. What you describe is "evict+time", while what I describe is (approximately) "flush+reload". — Mark, Jan 07 '18 at 01:34
@gnasher729: `index2` will hold the value 0x100 or 0x300, based upon data at the untrusted offset. There is no reason both `arr2->data[0x100]` and `arr2->data[0x300]` could not be accessible to the attacker. — supercat, Jan 09 '18 at 23:20

gnasher729 · Answer 4 · 2018-01-07T00:50:04.283

1

"What would have prevented access to that zone of memory in a normal execution"? Nothing would have prevented it. But normally the address used would have been based on data that was legally available to the running code. So the data would be read speculatively, a cache line would be ejected, and this would tell us which byte was read, but it would be a byte that we had the right to know anyway. So nothing would have been revealed that was secret.

Here's my proposal for handling the situation given by the sample code (in hardware):

One read operation is no problem. So a simple method would be to allow only one speculative read operation. A second speculative read would have to wait until the first one is not speculative anymore.

First improvement: A speculative read that doesn't modify the cache (or leaks information in some other way) is fine. So we always allow one speculative read, and then we allow more reads as long as they don't eject a cache line (or otherwise leak information).

Second improvement: Further reads are fine as long as they don't have an address dependent on the first read. So we keep track of which registers were the result of a speculative read, and how this propagates, and allow further reads as long as the address is not based on a speculative read. This allows "if (x > 0) z = a [0] + a [1] + a [2];" to proceed.

Third improvement: We modify L1 cache so that any attempt of reading data belonging to another process will produce a cache miss. Now we know that if a read operation hits L1 cache, then we were allowed to read the data. Therefore we ignore all reads that were L1 cache hits.

edited Jan 07 '18 at 00:50

answered Jan 07 '18 at 00:43

gnasher729

1,823
10
14

Thanks @gnasher729 : But, as it is presented, it seems that the problem arises from reading memory beyond "correct" data limits... This can be done in code directly by making untrusted_offset_from_caller>0x400 as stated in code's comment: What would have happened with that "legitimate" not speculative code? Isn't that memory address the same? (or the "...& take the full address..." of [the remedy](https://security.stackexchange.com/questions/176678/is-branch-predictor-flush-instruction-a-complete-spectre-fix) is also part of the problem and the memory location accessed is not the same?) – circulosmeos Jan 07 '18 at 01:45
@circulosmeos: If code "actually" tried to access an invalid address, that would result in a segmentation fault which would transfer control to the OS, which could examine the address and shut down the application if it was out of bounds. The problem is that if an invalid address is accessed speculatively, code can speculatively use the result of that access to speculatively compute an effective address without regard for whether the access was valid. – supercat Jan 07 '18 at 17:55
As an approach I would think would be simpler, how about simply not allowing a memory fetch to be retired unless or until either (1) it is valid, or (2) all previous operations have been retired? Issuing an invalid memory fetch and having it bump something from the cache would not be a security risk, since the code issuing the request would know what address it was using. The problem is that the invalid memory request can get retired before a preceding branch is resolved, thus allowing a subsequent memory fetch to be issued using data from the invalid one. If for some reason... – supercat Jan 07 '18 at 18:00
...that would be too complicated, another approach would be to add logic so that reading data from a page without access rights would behave as though the speculatively-fetched data was all zeroes, but that could create potential bugs if the fetch was invalid when it was issued but became valid later (that case should be rare enough that stalling the pipeline in the event it does occur wouldn't harm performance and would yield better behavior than latching the fact that a fault had occurred, but fetching zeroes and then having a fault not occur could be bad). – supercat Jan 07 '18 at 18:05
@gnasher729: "...to access an invalid address, that would result in a segmentation fault which would transfer control to the OS": Ok, that points to the heart of my concerns: why speculation cannot/isn't be built in hardware so memory accesses are subject to the same checks than normal CPU execution? This is a point that seems obvious in every approach to Spectre as it is not even said that this is a possibility - but why not? – circulosmeos Jan 08 '18 at 07:19

score 0 · Answer 5 · answered Jan 10 '18 at 09:33

0

I understand what circulosmeos is asking, this question is hanging in my mind, too: if reading unauthorized memory is not allowed in normal execution, why can it happen in speculative mode?

A logic expectation would be that it would fail in speculation, too, and make the speculative branch abort or such. Indeed the access error is registered, but only applied later if the code becomes "real".

My educated guess is that it was a choice to keep the design simpler and faster: if all effects of the speculative branch are discarded and not seen from the rest of the code, why bother to make the logic more complex than needed? Every transistor and every nanosecond count when designing such a beast.

The problem is that, due to cache, the assumptions were not exact: not all effects are reverted, and engineers failed to notice this fact, or were not aware that side-channel attacks can detect such effects, or did not realize that those effects can actually reveal memory contents if used in elaborate ways.

answered Jan 10 '18 at 09:33

m.alessandrini

1

To confirm your doubts, in normal execution a program is **immediately** stopped if you try to access memory outside your process' table, and this is done by hardware (MMU, page tables, etc). So the real question is why speculative mode does not obey normal rules. – m.alessandrini Jan 10 '18 at 15:45
thanks @m.alessandrini , your guesses seem plausible and I share them, that's the direction I'm pointing in my question: as for example **a diagram of a modern CPU rejecting a memory access based on tables, etc**... I'd love it :-) So then it would be clear that future CPUs can "easily" be redesigned to patch Spectre (& related) exploits. (Another question would be what actual patches for Spectre do to avoid all of this... how can the kernel avoid speculative execution for another process?) – circulosmeos Jan 10 '18 at 18:04
I found those hardware machineries well explained in books on operating systems, like e.g. Silberschatz, in chapters about memory management. I'm not expert at all, but I feel it should be easy to fix that in future hardware revisions. – m.alessandrini Jan 11 '18 at 10:51

circulosmeos · Answer 6 · 2018-03-02T20:34:40.990

After carefully studying all the very useful replies, and to order all the info I've gathered, I make here a fast resume of the knowledge I've achieved that has served to me as a satisfactory answer.

First of all I'm very grateful to @cort-ammon comment:

"The key concept is that no chip is obliged to precisely execute code according to the instructions. The obligation is instead that it must execute "as if" the code was precisely executed according to the instructions. All modern processors take advantage of this freedom."

This concept is really necessary in order to understand why all these CPU features (as Branch Target Prediction, Speculative Execution and so on) have been implemented in the first place - and why they remain as good (and now we know: as dangerous!) as ever.

Now as for the question:

what would have prevented access to that zone of memory in a normal execution if the code would have try to do so explicitly?

Nothing (As @immibis & @gnasher729 pointed out!): because this version of the Spectre vulnerability only allows to read data that the processor determines the process have access to.

There's no difference in the code from, for example, a read from a buffer overflow inside the same process memory space (See for example this code).

That is: the code indicated in the question is extracted from Variant 1: Bounds check bypass, in which the generic Proof of Concept (PoC) is shown. This PoC is used then in various ways: in order to use the code in Variant 1 for attacking the kernel, access to an eBPF bytecode interpreter is necessary, so that "Unprivileged userspace code can supply bytecode to the kernel". This is the characteristic (!) behaviour of eBPF ("extended Berkeley Packet Filter", a programmable feature of linux kernels) that the attack make full use of: this way the attack pass inaltered to the kernel, where it can read memory positions that it shouldn't (they'll be out of its supossed reserved memory, but in the same process space) without triggering any alarm - and it wouldn't even if the code were examined by the eBPF compiler (and it is not), because no direct out-of-bounds memory read is made at all.

But the PoC can also be used in Variant 2: Branch target injection in which that, with other tricks, force the CPU to make directed jumps in mis-speculated execution of other processes in order to gain read access to arbitrary virtual memory locations. This attack makes full use of the fact that the Branch Target Buffer (BTB) of some CPUs (at least Intel Haswell Xeon) only uses part of the full memory address to store branch prediction information. This is part of the fact that leads to the ability (attack) of one process to influence the Branch target predictor for another completely different process, thus bypassing userspace/kernel (and other) protections.

And now I understand where this partial memory address BTB fits in the answer that I pointed out to in my question, that stated that in order to protect a CPU from Spectre attacks:

"Branch predictor state must take the full address of the branch instruction into account (currently, to save space, only the low-order bits are used)." (@Mark)

As for the second question:

wouldn't the checks (if the answer to the first question is that additional checks are needed) or that 'flushing' impact in performance? and if so, has it been estimated how much? (when those supposed CPUs would/will be made).

now that I'm aware of the internals of these attacks, and also of the internals of CPU optimizations (BTB, speculative exec...) the question seems to me less imperative: CPU designers will try to maintain the balance between performance and security... may be just that security was not in this balance before these attacks were made public (even though very promissing side-channel attacks were already known... but that's another story).

For example, as (again) @cort-ammon points:

"For example, I'm certain there's an Intel designer looking right now at whether the next generation CPUs can un-evict cache lines if a speculative read falls though to get them one step closer to "as if.""

Thanks again for all the answers.

what fails in speculative execution that allows the read of memory out of bounds in Spectre vulnerability vs. normal CPU behaviour?

6 Answers6