11

Several new hardware side-channels were discovered called MDS attacks, which allow reading arbitrary memory, like Meltdown. Many existing mitigations are useless against them. The relevant CVEs are:

  • CVE-2018-12126 - Microarchitectural Store Buffer Data Sampling (MSBDS) 
  • CVE-2018-12130 - Microarchitectural Fill Buffer Data Sampling (MFBDS)
  • CVE-2018-12127 - Microarchitectural Load Port Data Sampling (MLPDS)
  • CVE-2019-11091 - Microarchitectural Data Sampling Uncacheable Memory (MDSUM)

A bit more information is given on CPUFail, Linux documentation, and a RedHat blog post.


My current understanding is that the microcode update changes the behavior of the obsolete VERW instruction so that it causes a flush of various internal processor buffers, and that the software update (in Linux at least) causes the OS to issue this instruction at any context switch (e.g. entering and exiting syscalls). CVE-2018-12130 (MFBDS), however, cannot be mitigated this way because the buffer is shared between logical (but not physical) cores. It is necessary to disable SMT (Hyper-Threading).

CVE-2018-12130 (MFBDS) can only be partially mitigated by disabling SMT, according to an in-depth blog post. Some information can still be leaked through a context switch during syscalls. Are microcode and software updates described above, in addition to disabling SMT, sufficient to completely avoid it?

Finally, is installing the latest microcode and operating system update and disabling SMT sufficient to completely mitigate all of these newly-discovered microarchitectural attacks, including ZombieLoad?

forest
  • 64,616
  • 20
  • 206
  • 257

2 Answers2

7

My current understanding is that the microcode update changes the behavior of the obsolete VERW instruction so that it causes a flush of various internal processor buffers

The new behavior of the VERW instruction is described in this article. In particular:

  • The VERW instruction retains the same existing functionality, i.e., it checks whether the specified segment is writable from the current privilege level.
  • Only the memory-operand variant of the instruction is guaranteed to overwrite the buffers exploited by MDS. The register-operand variant may or may not perform the buffer overwriting functionality.
  • The buffer overwriting functionality occurs regardless of the result of the segment write permission check (including an exception).

The VERW instruction execution by itself does not prevent later instructions from being executed before all of the MDS-affected buffers are overwritten. Therefore, it's necessary to place a serializing instruction (Intel calls a speculation barrier) after VERW. Consider the example from the same article:

Code region A (victim accessing secret data)
VERW m16
Code region B (victim accessing data that is not secret)
Speculation barrier (for example, LFENCE)
Code region C (the attacker can only see the data accessed in B)

Assume that these instructions are being executed on a processor with the MD_CLEAR microcode update (discussed below). The execution of A may leave some secret in-flight data on the same physical core. When VERW begins execution, B may execute before all the leaky buffers are overwritten. A barrier, such as LFENCE, need to be placed after B to ensure that C cannot access the secret data.

The VERW instruction is not supported in real mode and virtual-8086 mode because segment access permissions are not available in these modes. Therefore, in these modes, a sequence of instructions, which depends on the microarchitecture, needs to be used instead.

The following characteristics of VERW explain why Intel chose to overload that instruction with the buffer overwriting functionality (instead of any other instruction or introducing a new MSR):

  • VERW is microcoded, which is probably necessary for a microcode update to work.
  • VERW is rarely used, so the resulting performance overhead is practically insignificant on existing software.
  • VERW can be executed at any privilege level. In particular, it can be used in cases where the security boundaries are in user mode (e.g., SGX and sandboxes).

VERW is not perfect though. As already said above, it doesn't work in real mode and virtual-8086 mode. It also modifies the ZF flag.

CVE-2018-12130 (MFBDS) can only be partially mitigated by disabling SMT, according to an in-depth blog post. Some information can still be leaked through a context switch during syscalls.

There are two cases that need to be considered separately:

  • The attacker and the victim never run on two threads of the same physical core at the same time. This can occur when HT is disabled or when the OS scheduler decides to run the threads on different physical cores at the same time (because, for example, the threads have different physical core affinities). Either way, the threads may still run on the same logical core at different points in time. An MDS exploit can still be successful. The only way for the attacker to run on the same logical core as the one the victim runs on is when the victim switches to kernel mode (e.g., system call or hardware interrupt), and the attacker gets scheduled to run next on the same logical core. Therefore, the kernel can fully prevent the attacker from exploiting the internal CPU buffers by executing the VERW instruction before returning to user mode (to run next whatever thread is scheduled on that logical core). This also ensures that the buffers contain no memory requests from the kernel when returning to user mode. Similarly, VERW needs to be executed when switching between two virtual machines on the same logical core.
  • The attacker and the victim may run concurrently on the same physical core. The Linux kernel documentation on MDS mentions that HT needs to be disabled for full protection to prevent this particular situation from occurring in the first place. The Intel article on MDS, however, proposes an alternative mitigation called group scheduling. The idea here is to ensure that two threads are scheduled to run on two sibling logical cores only if they mutually trust each other. The Hyper-V hypervisor already employs group scheduling (and it has been recently updated to use VERW when switching between virtual processors that belong to different VMs). During the execution of VERW (or the alternative software sequence), the sibling logical core must be quiesced (e.g., execute HLT or PAUSE) to ensure that all the buffers get overwritten.

The aforementioned mitigations (overwriting the MDS-affected buffers when returning from the kernel or when switching between VMs, disabling HT, and group scheduling) cannot protect sandboxed applications (in a web browser) and SGX enclaves, where there is no switching between privilege levels. One possible mitigation for sandboxed apps is using processes instead. SGX enclaves are protected by the microcode update itself.

The MD_CLEAR microcode update seems to include the following changes:

  • New functionality to the VERW instruction as discussed above. Only the buffers that are vulnerable on each particular processor are overwritten, so the impact of VERW on performance depends on the processor.
  • When entering or exiting an SGX enclave, the MDS-affected buffers are overwritten. However, on the entrance to the enclave, it must be ensured that no untrusted thread runs on the sibling logical core.
  • When exiting system management mode (using the RSM instruction), the MDS-affected buffers are overwritten. However, on the entrance to SMM mode, the SMM software must ensure that no untrusted thread runs on the sibling logical core.
  • The RIDL paper in Section IX mentions that "The updated microcode also flushes these buffers when flushing the L1 cache." I think this refers to the IA32_FLUSH_CMD MSR, where setting the bit at index 0 to 1 causes the processor to writeback and invalidate the whole L1D cache. This is referred to as the L1D_FLUSH command. It also overwrites all buffers that are vulnerable to MDS.

The following processors are not vulnerable to any MDS attack, but are vulnerable to TAA:

  • Whiskey Lake (steppings 12 and 13 only)1.
  • Coffee Lake Refresh (stepping 13 only).
  • 2nd Gen Xeon Scalable Processors (steppings 6 and 7 only).

Microcode updates that are similar to MD_CLEAR also apply to these processors to mitigate TAA. Therefore, VERW has a performance penalty on these processors as well (and it's buggy according to erratum CLX38).

For some processors, Intel has released multiple versions of the MD_CLEAR microcode update to fix bugs in earlier versions.

There are processors that are vulnerable to both MDS and TAA. These include Coffee Lake Refresh (steppings 10, 11, 12 only), Whiskey Lake (stepping 11 only), 2nd Gen Xeon Scalable Processors (stepping 5 only), and earlier down to and including Haswell. On these processors, the MDS mitigations also work for TAA. There are processors that are only vulnerable to MDS and not TAA, which include some of the those that don't support TSX.

Ice Lake, Goldmont, Goldmont Plus, Tremont processors are the only modern Intel processors that are not affected by both MDS and TAA and retain the legacy behavior of VERW.

In this Intel article, the performance impact of the microcode update and OS patch (to use the VERW instruction) appears to me to be significant (over 5%) for some benchmarks. There is also a list of FAQs at the end where Intel recommends against disabling HT, which makes sense.

Section E of the RIDL paper mentions that the authors were able to leak physical addresses from the page walking hardware of the MMU (page walks go through the LFBs). I've not seen any proposed mitigations for this attack.

Some recent processors include hardware mitigations for all of the four MDS attacks. This can be checked using the following sequence of commands:

sudo modprobe msr
sudo rdmsr -p 0 0x10A

The first command loads the msr kernel module and the second command reads the value in the IA32_ARCH_CAPABILITIES MSR. If the sixth bit (bit at index 5) is 1, the processor has hardware mitigations for all MDS attacks, and so all of the mitigations discussed above are not needed. This bit is called MDS_NO. Otherwise, the processor has no hardware mitigations for at least MSBDS, MLPDS, and MDSUM. Note that if the IA32_ARCH_CAPABILITIES MSR itself is not supported, then the processor definitely has no hardware mitigations for all MDS attacks.

For discussion on how MFBDS, MLPDS, and MDSUM work, see: About the RIDL vulnerabilities and the “replaying” of loads. For discussion on how MSBDS works, see: What are the microarchitectural details behind MSBDS (Fallout)?.


Footnotes:

1 I'm not aware of any released Whiskey Lake processors with stepping 13. This could be an error in Intel's list, or it could be that Intel decided to not release these processors.

Hadi Brais
  • 315
  • 1
  • 6
  • Interesting. So none of these mitigations can hide physical addresses? Is the impact simply that attacks such as rowhammer are easier to exploit, or does it also leak enough to fully break ASLR (not just KASLR)? – forest May 16 '19 at 03:56
  • @forest `VERW` can be used to overwrite the in-flight data of the MMU, but there is no way to know when the instruction should be executed because page walks can occur concurrently with execution and at anytime. A mitigation in hardware may require stalling the whole physical core until the page walk finishes, followed by overwriting the buffers, and then resuming execution. That said, I think exploiting the in-flight data of the MMU in practice is extremely difficult because... – Hadi Brais May 16 '19 at 04:12
  • ...there is just too much noise (activity from the core) and the page mapping may change before the attacker can make any use of whatever data it could leak from the MMU. It's extremely likey that such an attack can succeed in a real environment and leak secret data. – Hadi Brais May 16 '19 at 04:12
  • I meant "unlikey." – Hadi Brais May 16 '19 at 04:18
  • _Some recent processors include hardware mitigations for all of the four MDS attacks_ - How could this be if the MDS attacks were reported to Intel less than a year ago? I was under the impression that some anti-Spectre defenses actually made some of these attacks _worse_. – forest May 17 '19 at 18:18
  • @forest These include processors that were released within the past year and some low-power processors (the Intel Atom family), which are not designed for performance. – Hadi Brais May 17 '19 at 18:23
  • Interesting. The RIDL paper at least failed to mention this and implied that no hardware mitigations currently exist or are in development. – forest May 17 '19 at 18:24
  • 1
    @forest The RIDL paper must have been written like year ago and the authors may not have updated it since then. There are many minor technical errors in the paper, but the description of how to carry out a successful MDS-style attack is certainly correct in the paper. – Hadi Brais May 17 '19 at 18:26
  • I see. I'll have to look more into `MDS_NO`. Does it also deal with leaking physical page information from the MMU? It's good to hear that Intel is rapidly developing hardware mitigations, though. A few more generations and they may have this down. – forest May 17 '19 at 18:27
  • @forest It does overwrite the buffers (which may contain sensitive stale data), but the problem is that the page walks can happen in the background and there is no way to know (from the software perspective) when the buffers should be flushed to prevent leaking physical addresses. The only way to mitigate this is really in the hardware (as indicated by the `MDS_NO` bit). – Hadi Brais May 17 '19 at 18:31
  • Sorry, I meant `MDS_NO` in `IA32_ARCH_CAPABILITIES`. – forest May 17 '19 at 18:32
  • @forest I think the MMU attack falls under MFBDS. So yes, if a processor includes a hardware mitigation for MFBDS, then it is not vulnerable to leaking physical addresses. `MDS_NO` is a fix in the hardware that makes the processor not to forward stale data from any of the affected buffers (store buffers, LFBs, and load ports), which is the root cause of MDS. That said, a software mitigation for MFBDS does not really solve the MMU problem completely. – Hadi Brais May 17 '19 at 18:37
  • So as a finally summary to ensure I am fully understanding, it's safe to say that with software updates to use the deprecated instruction and microcode updates to change said instruction's behavior, and with SMT disabled, all MDS attacks other than the one leaking MMU information are mitigated. With newer hardware, all the MDS attacks including the one leaking MDS information are mitigated automatically. Is this correct? – forest May 17 '19 at 18:41
  • 2
    @forest Yes, assuming there are no bugs in the mitigations themselves, of course :) – Hadi Brais May 17 '19 at 18:42
  • 1
    A user-space sandbox could use these mitigations, I assume that's why Intel overloaded `verw` instead of adding another MSR. And `lfence` is unprivileged. So it would be more accurate to say that kernel mitigations can't protect sandboxes from their untrusted code, not that these mitigations won't work. (Of course the cost could be prohibitive for frequent switching between trusted and JITed code. So maybe we could say the mitigations are unusably slow for most JVM / Javascript VMs, unless they spend long periods of time without leaving JITed code.) – Peter Cordes May 19 '19 at 03:46
  • Is there any simple explanation posted anywhere for why cross-SMT matters for the store buffer? AFAIK, Intel CPUs *statically* partition the store buffer so each logical thread can only use half of it. Does triggering a memory-disambiguation mispredict sometimes forward from a still-live store buffer entry used by the other logical core, containing not-yet-committed data? Or is it just another possible source of "don't care" data rather than actual mis-speculation of real forwarding? (I haven't fully caught up on all the MDS exploits yet :/) – Peter Cordes May 19 '19 at 03:57
  • @PeterCordes I'll post an answer on https://stackoverflow.com/questions/56156437/what-are-the-microarchitectural-details-behind-msbds-fallout. But other than reading the MDS papers, the Intel article I linked to in the answer is a good read and includes a lot of information not found in the papers. I also have many unanswered questions :/ – Hadi Brais May 19 '19 at 04:01
  • Ah yes, I searched again in that article and found the answer. *When a thread enters a sleep state, its store buffer entries may become usable by the other active thread.* So yes, unlike many of the other cross-thread vulns, this one relies on the victim thread releasing its store buffer entries to the other logical core when it becomes the only active thread. `pause` and stalls in general don't do that, I think, only `hlt` or `mwait` sleep. Or user-space `umwait` in future CPUs... That Intel page addresses static partition vs. competitive sharing for the other vulns, too. – Peter Cordes May 19 '19 at 04:05
  • 1
    @PeterCordes Most secure sandboxes are not user space and rely on the kernel (e.g. seccomp). Any sandbox which holds both trusted and untrusted data in the same address space is going to be a bad sandbox, regardless of whether or not MDS is applicable. Sadly, this does include JS sandboxes for browsers that do not isolate tabs in their own processes... – forest May 19 '19 at 04:11
0

These are the descriptions of the CVEs:

CVE-2018-12126 - flaw that could lead to information disclosure from the processor store buffer.

CVE-2018-12127 - exploit of the microprocessor load operations that can provide data to an attacker about CPU registers and operations in the CPU pipeline.

CVE-2018-12130 - implementation error of the microprocessor fill buffers that can expose data within that buffer.

CVE-2019-11091 - flaw in the implementation of the "fill buffer," a mechanism used by modern CPUs when a cache-miss is made on L1 CPU cache.

To fix the overall problem, one should make sure that trusted and untrusted code do not share physical cores.

Disabling HT does help this not to happen in the HT case, but in a VM environment you could still end up with dangerous and non-dangerous codes running on the same physical core, because there are two possible attack vectors at the Hypervisor level:

Sequential-context attack vector (SCAV, Inter-VM): a malicious VM can potentially infer recently accessed data of a previous context (HV thread or other VM thread) on either logical processor of a processor core.

Concurrent-context attack vector (CCAV Inter-VM): a malicious VM can potentially infer recently accessed data of a concurrently executing context (HV thread or other VM thread) on the other logical processor of the HT-enabled processor core.

As you can see, one of the vectors does not require HT to be enabled. So disabling HT only solves one of the 2 possibilities of attack (CCAV).

To fix the other, software-level patching is needed to make sure that a SCAV does not happen.

For SCAV, Hypervisors must be patched with the Intel-provided microcode updates. In the case of VMWare, those are provided in separate ESXi patches for most of affected Intel platforms. For CCAV VMware also offers a solution (Side-Channel-Aware Scheduler can be enabled - this practically makes sure that such an exploit cannot happen) but doing so could impact performance. Anyway, performance impact should be lower than disabling HT, but note that SCAS is for Hypervisor layer, not Virtual Machine layer. The actual VMs are still vulnerable if unpatched.

As a conclusion, both HVs and VMs must be patched or HT disabled for the second case (CCAV) and patching at HV level based on intel microcode updates is needed for the 1st case (SCAV).

Overmind
  • 8,779
  • 3
  • 19
  • 28
  • _To fix the overall problem, one should make sure that trusted and untrusted code do not share physical cores._ - I'm not sure that your conclusion is entirely correct. Why couldn't `VERW` be called during VM exits and the like? Is there a reason why it would be insufficient and if so, could you explain it? This doesn't answer whether or not microcode+software+disabled SMT completely mitigates all these threats or not, either. – forest May 15 '19 at 06:15
  • Maybe I should of stated that part differently since keeping things physical kind of defeats the purpose of virtualization. Xeon based processors have a private L1 cache (each core its own L1), so generally, something running on another physical core cannot 'steal' anything from another's L1 cache. Now when you come in with an additional logical layer, a HV should make sure that 'optimizing' such things does not have undesired effects. That said, the actual end of my answer 'As a conclusion' answers the question. Yes, you can mitigate all 4 of them, but the 2 situations and the how-to differ. – Overmind May 15 '19 at 06:42
  • All modern multicore processors have a private L1 cache, and even L2 cache. L3 is called the LLC (Last Level Cache, which is shared by all cores) for a reason. – forest May 15 '19 at 06:44
  • Since in the case of CCAV some OS'es may be late or unreliable on patching, the play-it-safe way would be to disable HT. So doing that and patching the HV for both problems is complete mitigation for all 4 CVEs. – Overmind May 15 '19 at 06:45
  • I assume by HV you meant HV and kernel/microcode (in the case where cross-process memory reads, not cross-VM, is the thread model)? While I know this affects hypervisors, that's not my primary concern at all. – forest May 15 '19 at 06:46
  • Yes, and that should be all doable by this path: step 1 - intel providing the necessary microcode updates, step 2 - virtualization solutions make their patches. – Overmind May 15 '19 at 06:55