What was state of the art knowledge on security of speculative evaluation when it was introduced to Intel CPUs?

Question

Many sources claim that almost all Intel x86 CPUs back to Pentium Pro are vulnerable to the Meltdown attack. Pentium Pro was introduced to the market in 1995.

What was the state of the art knowledge on security of speculative evaluation, the basis for the Meltdown attack, at that time?

So this question turned out to be out of date; Meltdown has since been tried on Pentium Pro; on processors prior to the Pentium IV, speculative execution reads CPU registers rather than kernel memory; namely it's possible to read the location of the page table itself in physical RAM and the error trap addresses and a few other things out of CR* registers. I saw an attack on speculative execution in 1998 able to steal SSH or SSL keys by instruction timings alone against Hyperthreading; which is before the Pentium IV release. — Joshua, Sep 14 '21 at 19:44

score 12 · Answer 1 · answered Jan 04 '18 at 23:51

12

The state of the art was non-existent.

At the time of the Pentium Pro, the World Wide Web was four years old. Widespread use of shared hosting was about ten years in the future; if you suggested that people would want to run untrusted code provided by random third parties, they'd look at you like you'd grown a second head. Memory protection was about preventing one crashing program from taking down the whole system, not about letting programs hide data from one another. Speculative execution was not seen as having any security implications whatsoever -- it was simply a way of avoiding performance-killing pipeline stalls.

answered Jan 04 '18 at 23:51

Mark

34,390
9
85
134

2

Could the speculative execution used by the Pentium Pro extend as far as using speculatively-fetched data in the computation of addresses for subsequent speculative fetches? And did it allow speculative execution to ignore memory permissions? It is the combination of those things which creates the vulnerabilities, and I doubt the first chip to have any kind of speculative execution would extend its reach that far. – supercat Jan 05 '18 at 03:00
1

Also, as far as I understand, the pipeline itself would have to be long enough too, so that several instructions could even reach the stage that requires the final speculative fetch before the branch is retired. Pentium Pro's pipeline was short compared to later chips. However, Alex Ionescu [claims](https://twitter.com/aionescu/status/948579519492849665) that it indeed has the same problem, and I think I saw some article explicitly stating that Intel's speculative execution did not check permissions even back then—hence my question. – liori Jan 05 '18 at 04:08
1

@liori: I wouldn't doubt that the Pentium Pro performed speculative fetches without performing security checks first. I would find it very surprising, however, if the first x86 processor to use speculative execution was able to handle multiple outstanding conditions simultaneously. The complexity required to do that would seem much greater than the complexity required to handle one, and given 1995 memory speeds I don't see what the payoff would have been. – supercat Jan 05 '18 at 16:11
2

@supercat Checked some facts. L2 miss penalty was 50 cycles. PPro could retire 3 instructions per cycle, of which one could be a load/store. The out-of-order engine could handle up to 40 instructions. So actually seems like plenty of time, and if results of one instruction could not be reused for another waiting instruction, 40 sounds like a waste given the limited number of registers in x86. Also, the retirement register file, where results of instructions are stored, can indeed be used by subsequent speculative instructions. I'm quite convinced myself now. – liori Jan 05 '18 at 21:17
@liori: The out-of-order engine could handle 40 definitely-executed instructions, but how many levels of speculation could it handle? Also, what advantage was there to allowing speculation to proceed past an invalid memory access? I would think that treating a seemingly-invalid access as though a prediction was found to be incorrect would be both simpler and better than having to keep track of a trap that will need to fire if the previous prediction is found to be correct. The only case where speculation could be helpful would be if a non-predicted path causes... – supercat Jan 05 '18 at 23:11
...the previously-invalid access to become valid, in which case the fact that the earlier prediction was wrong would require the results of speculative processing to be discarded, but allow the cached data to remain. That seems like a rather tenuous benefit; I would think that more often the fetches by discarded instructions would compete for resources with other cores that could be doing something useful. – supercat Jan 05 '18 at 23:12
1

@supercat, the advantage of allowing speculation to proceed past an invalid memory access is that you don't need to check access validity until it's time to turn the speculative execution into real execution, in the processor's retirement unit. This simplifies CPU design and speeds up execution. – Mark Jan 05 '18 at 23:17
@Mark: Before a system can begin a physical memory fetch, it must fetch the appropriate segment/page descriptors and compute the physical address. Could you explain on https://security.stackexchange.com/questions/176731/why-do-cpus-operate-speculatively-with-results-of-forbidden-memory-fetches?noredirect=1#comment340899_176731 why there would be any difficulty knowing whether an access was invalid before issuing the request (or, at worst, before receiving the response), or what design would more easily allow traps to be deferred than toss the results of speculative execution... – supercat Jan 06 '18 at 01:07
...and trap when code tries to execute an invalid memory fetch "for real"? I would think the logic to defer traps would be much more complicated. – supercat Jan 06 '18 at 01:07
1

@supercat, simplified version of why an invalid fetch doesn't trigger a trap immediately: The retirement unit (the part of the CPU that turns speculative and out-of-order execution into the illusion of in-order execution) must have circuitry to trigger a pipeline flush - it's the only place where you can say a branch misprediction occurred. Since it's already there, you can make the CPU simpler and probably faster by re-using it for all other pipeline flushes (such as traps). The easiest way to do this is to have a failed instruction only trigger the trap when it reaches the retirement unit. – Mark Jan 06 '18 at 04:32

What was state of the art knowledge on security of speculative evaluation when it was introduced to Intel CPUs?

1 Answers1