My current understanding is that the microcode update changes the
behavior of the obsolete VERW instruction so that it causes a flush of
various internal processor buffers
The new behavior of the VERW
instruction is described in this article. In particular:
- The
VERW
instruction retains the same existing functionality, i.e., it checks whether the specified segment is writable from the current privilege level.
- Only the memory-operand variant of the instruction is guaranteed to overwrite the buffers exploited by MDS. The register-operand variant may or may not perform the buffer overwriting functionality.
- The buffer overwriting functionality occurs regardless of the result of the segment write permission check (including an exception).
The VERW
instruction execution by itself does not prevent later instructions from being executed before all of the MDS-affected buffers are overwritten. Therefore, it's necessary to place a serializing instruction (Intel calls a speculation barrier) after VERW
. Consider the example from the same article:
Code region A (victim accessing secret data)
VERW m16
Code region B (victim accessing data that is not secret)
Speculation barrier (for example, LFENCE)
Code region C (the attacker can only see the data accessed in B)
Assume that these instructions are being executed on a processor with the MD_CLEAR
microcode update (discussed below). The execution of A may leave some secret in-flight data on the same physical core. When VERW
begins execution, B may execute before all the leaky buffers are overwritten. A barrier, such as LFENCE
, need to be placed after B to ensure that C cannot access the secret data.
The VERW
instruction is not supported in real mode and virtual-8086 mode because segment access permissions are not available in these modes. Therefore, in these modes, a sequence of instructions, which depends on the microarchitecture, needs to be used instead.
The following characteristics of VERW
explain why Intel chose to overload that instruction with the buffer overwriting functionality (instead of any other instruction or introducing a new MSR):
VERW
is microcoded, which is probably necessary for a microcode update to work.
VERW
is rarely used, so the resulting performance overhead is practically insignificant on existing software.
VERW
can be executed at any privilege level. In particular, it can be used in cases where the security boundaries are in user mode (e.g., SGX and sandboxes).
VERW
is not perfect though. As already said above, it doesn't work in real mode and virtual-8086 mode. It also modifies the ZF
flag.
CVE-2018-12130 (MFBDS) can only be partially mitigated by disabling
SMT, according to an in-depth blog post. Some information can still be
leaked through a context switch during syscalls.
There are two cases that need to be considered separately:
- The attacker and the victim never run on two threads of the same physical core at the same time. This can occur when HT is disabled or when the OS scheduler decides to run the threads on different physical cores at the same time (because, for example, the threads have different physical core affinities). Either way, the threads may still run on the same logical core at different points in time. An MDS exploit can still be successful. The only way for the attacker to run on the same logical core as the one the victim runs on is when the victim switches to kernel mode (e.g., system call or hardware interrupt), and the attacker gets scheduled to run next on the same logical core. Therefore, the kernel can fully prevent the attacker from exploiting the internal CPU buffers by executing the
VERW
instruction before returning to user mode (to run next whatever thread is scheduled on that logical core). This also ensures that the buffers contain no memory requests from the kernel when returning to user mode. Similarly, VERW
needs to be executed when switching between two virtual machines on the same logical core.
- The attacker and the victim may run concurrently on the same physical core. The Linux kernel documentation on MDS mentions that HT needs to be disabled for full protection to prevent this particular situation from occurring in the first place. The Intel article on MDS, however, proposes an alternative mitigation called group scheduling. The idea here is to ensure that two threads are scheduled to run on two sibling logical cores only if they mutually trust each other. The Hyper-V hypervisor already employs group scheduling (and it has been recently updated to use
VERW
when switching between virtual processors that belong to different VMs). During the execution of VERW
(or the alternative software sequence), the sibling logical core must be quiesced (e.g., execute HLT
or PAUSE
) to ensure that all the buffers get overwritten.
The aforementioned mitigations (overwriting the MDS-affected buffers when returning from the kernel or when switching between VMs, disabling HT, and group scheduling) cannot protect sandboxed applications (in a web browser) and SGX enclaves, where there is no switching between privilege levels. One possible mitigation for sandboxed apps is using processes instead. SGX enclaves are protected by the microcode update itself.
The MD_CLEAR
microcode update seems to include the following changes:
- New functionality to the
VERW
instruction as discussed above. Only the buffers that are vulnerable on each particular processor are overwritten, so the impact of VERW
on performance depends on the processor.
- When entering or exiting an SGX enclave, the MDS-affected buffers are overwritten. However, on the entrance to the enclave, it must be ensured that no untrusted thread runs on the sibling logical core.
- When exiting system management mode (using the
RSM
instruction), the MDS-affected buffers are overwritten. However, on the entrance to SMM mode, the SMM software must ensure that no untrusted thread runs on the sibling logical core.
- The RIDL paper in Section IX mentions that "The updated microcode also flushes these buffers when flushing the L1 cache." I think this refers to the
IA32_FLUSH_CMD
MSR, where setting the bit at index 0 to 1 causes the processor to writeback and invalidate the whole L1D cache. This is referred to as the L1D_FLUSH
command. It also overwrites all buffers that are vulnerable to MDS.
The following processors are not vulnerable to any MDS attack, but are vulnerable to TAA:
- Whiskey Lake (steppings 12 and 13 only)1.
- Coffee Lake Refresh (stepping 13 only).
- 2nd Gen Xeon Scalable Processors (steppings 6 and 7 only).
Microcode updates that are similar to MD_CLEAR
also apply to these processors to mitigate TAA. Therefore, VERW
has a performance penalty on these processors as well (and it's buggy according to erratum CLX38).
For some processors, Intel has released multiple versions of the MD_CLEAR
microcode update to fix bugs in earlier versions.
There are processors that are vulnerable to both MDS and TAA. These include Coffee Lake Refresh (steppings 10, 11, 12 only), Whiskey Lake (stepping 11 only), 2nd Gen Xeon Scalable Processors (stepping 5 only), and earlier down to and including Haswell. On these processors, the MDS mitigations also work for TAA. There are processors that are only vulnerable to MDS and not TAA, which include some of the those that don't support TSX.
Ice Lake, Goldmont, Goldmont Plus, Tremont processors are the only modern Intel processors that are not affected by both MDS and TAA and retain the legacy behavior of VERW
.
In this Intel article, the performance impact of the microcode update and OS patch (to use the VERW
instruction) appears to me to be significant (over 5%) for some benchmarks. There is also a list of FAQs at the end where Intel recommends against disabling HT, which makes sense.
Section E of the RIDL paper mentions that the authors were able to leak physical addresses from the page walking hardware of the MMU (page walks go through the LFBs). I've not seen any proposed mitigations for this attack.
Some recent processors include hardware mitigations for all of the four MDS attacks. This can be checked using the following sequence of commands:
sudo modprobe msr
sudo rdmsr -p 0 0x10A
The first command loads the msr
kernel module and the second command reads the value in the IA32_ARCH_CAPABILITIES
MSR. If the sixth bit (bit at index 5) is 1, the processor has hardware mitigations for all MDS attacks, and so all of the mitigations discussed above are not needed. This bit is called MDS_NO
. Otherwise, the processor has no hardware mitigations for at least MSBDS, MLPDS, and MDSUM. Note that if the IA32_ARCH_CAPABILITIES
MSR itself is not supported, then the processor definitely has no hardware mitigations for all MDS attacks.
For discussion on how MFBDS, MLPDS, and MDSUM work, see: About the RIDL vulnerabilities and the “replaying” of loads. For discussion on how MSBDS works, see: What are the microarchitectural details behind MSBDS (Fallout)?.
Footnotes:
1 I'm not aware of any released Whiskey Lake processors with stepping 13. This could be an error in Intel's list, or it could be that Intel decided to not release these processors.