28

If you're already familiar with PCI behavior and Linux's handling of DMA buffers, skip to the third section for my actual question. Otherwise read on for a small summary of how PCI devices perform memory accesses, and how the kernel handles communicating with devices using DMA. I've included this here both in hopes of providing people asking the same question with useful information, and to give others the chance to correct me in case my understanding is off.

(My understanding of) PCI, IOMMU, and DMA

PCI and PCIe devices have in their configuration space a two byte command register which contains a bitmask for enabling or disabling several different hardware features. Bit 2 is the bus master enable bit which, when set, allows the device to initiate DMA requests. This bit, and any other bits on the command register, is set by software running in supervisor mode (typically by kernel drivers) and, despite being physically stored on the PCI device, cannot be changed by it (Actually, this may be wrong. Is it that a PCI bridge won't pass through the DMA request unless it has bus master as well?). On hardware without an IOMMU, the device can request reads and writes to any legal memory address. This is often called a DMA attack or evil bus mastering, and is an issue on any unprotected system with malicious PCI devices. The IOMMU is supposed to be the solution to improve both security and performance. For reference, I am specifically asking about Intel's implementation, VT-d (precisely the more modern VT-d2).

Most systems can be configured for DMA remapping, or DMAR. The ACPI tables included in the BIOS often have the DMAR table, which contains a list of addresses which various PCI groups will have all their memory accesses routed to. This is all described in the section 2.5.1.1 of Intel's VT-d specifications. A graphic from the document summarizes how this works:

DMA remapping

The Linux kernel DMA API

The DMAR tables are hardcoded by the BIOS. A given PCI device (or rather, a given IOMMU group) is allowed to access only a pre-determined memory range. The kernel is told where that memory is and is instructed not to allocate any memory there which it does not want readable/writable over DMA. The remapping values are reported in the kernel log buffer:

DMAR: Setting identity map for device 0000:00:02.0 [0xad000000 - 0xaf1fffff]
DMAR: Setting identity map for device 0000:00:14.0 [0xa95dc000 - 0xa95e8fff]
DMAR: Setting identity map for device 0000:00:1a.0 [0xa95dc000 - 0xa95e8fff]
DMAR: Setting identity map for device 0000:00:1d.0 [0xa95dc000 - 0xa95e8fff]
DMAR: Prepare 0-16MiB unity mapping for LPC
DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
DMAR: Intel(R) Virtualization Technology for Directed I/O
iommu: Adding device 0000:00:00.0 to group 0
iommu: Adding device 0000:00:01.0 to group 1
iommu: Adding device 0000:00:02.0 to group 2
iommu: Adding device 0000:00:14.0 to group 3
iommu: Adding device 0000:00:16.0 to group 4
iommu: Adding device 0000:00:1a.0 to group 5
iommu: Adding device 0000:00:1b.0 to group 6
iommu: Adding device 0000:00:1c.0 to group 7
iommu: Adding device 0000:00:1c.2 to group 8
iommu: Adding device 0000:00:1c.3 to group 9
iommu: Adding device 0000:00:1c.4 to group 10
iommu: Adding device 0000:00:1d.0 to group 11
iommu: Adding device 0000:00:1f.0 to group 12
iommu: Adding device 0000:00:1f.2 to group 12
iommu: Adding device 0000:00:1f.3 to group 12
iommu: Adding device 0000:01:00.0 to group 1
iommu: Adding device 0000:03:00.0 to group 13
iommu: Adding device 0000:04:00.0 to group 14
iommu: Adding device 0000:05:00.0 to group 15

From the bolded lines, we see that group 11 contains (only) device 0000:00:1d.0, which is able to freely access 13 pages of memory in the range of 0xa95dc000 - 0xa95e8fff. All accesses for devices in group 11 will only be able to write there, preventing them from modifying the contents of other DMA buffers, or unrelated OS code. This way, even if the device has its bus master bit set, it does not need to keep track of where it is writing, and it cannot (accidentally or maliciously) write anywhere it is not supposed to.

When a kernel driver wants to interact with a device over DMA, it allocates memory specifically for this purpose using, for example, void *addr = kmalloc(len, GFP_KERNEL | GFP_DMA). This will return, in addr, a virtual memory address pointing to a contiguous section of memory len bytes in size which is suitable for DMA use. This is all described in more detail in the Linux DMA API documentation. The driver is then free to communicate with the PCI device through this shared memory region. The series of events, simplified, may look something like this:

  • OpenCL driver allocates memory, shared with the GPU PCI device, for DMA use.
  • Driver writes some vector data to the DMA address, and goes off to do something else.
  • GPU reads the data over the PCI device, and begins the slow task of processing it.
  • When finished, the GPU writes the finished data to the buffer and fires off an interrupt.
  • Driver stops what it is doing due to the interrupt and reads the rendered graphic from memory.

Does the kernel distrust DMA buffers and handle them securely?

Does the kernel implicitly trust these DMA buffers? Can a malicious or compromised PCI device, writing nowhere other than the designated buffers (the IOMMU prevents it from doing otherwise), compromise the kernel by exploiting the data structures they are sharing? The obvious answer is possibly, because any sharing and parsing of complex data structures using memory unsafe languages carries with it the risk of exploitation. But the kernel developers may assume that these buffers are trusted and put absolutely no effort into securing the kernel from malicious activity in them (unlike, say, the data shared between unprivileged userland and the kernel via copy_from_user() and similar functions). I am starting to think that the answer to whether or not a malicious PCI device can compromise the host despite the IOMMU's restrictions is probably.

Exploitation of such a vulnerability would work something like this, where buf is in the device-controlled and DMA-writable address space, and dest is elsewhere in kernel memory:

  • Device writes data as struct { size_t len; char data[32]; char foo[32]; } buf.
  • Driver is to copy to data in struct { char data[32]; bool summon_demons; } dest.
  • Device maliciously sets buf.len = sizeof(buf.data) + 1 and buf.foo[0] = 1.
  • Driver copies data insecurely, using memcpy(dest.data, buf.data, buf.len).
  • PCI device gains control over the kernel and your immortal soul in a classic buffer overflow.

Obviously this is a contrived example and while most likely such an obvious bug would not make its way into the kernel in the first place, it illustrates my point, and brings me to my primary question:

Are there any examples of vulnerabilities from improper handling of data structures shared over DMA, or of any specific drivers treating the input from PCI devices as trusted?

Limitations of VT-d as an IOMMU

I am aware of its limitations and don't want an answer which tries to explain how a device could work around the IOMMU directly or use another loophole to gain control of the system. I know:

  • It cannot adequately protect a system unless x2APIC and Interrupt Remapping are supported.
  • Address Translation Services (ATS) can bypass the IOMMU.
  • Modified PCI expansion ROMs can attack the system on reboot.
  • All devices in a given IOMMU group have access to the same memory (assuming no ACS).
  • Some BIOSes come with a broken DMAR table, resulting in the IOMMU being disabled.
  • The CSME ("Intel ME") may be able to disable VT-d via PSF and PAVP.
  • Yet unknown attacks may be capable of disabling or bypassing the IOMMU.

*DMA means Direct Memory Access. It is a hardware feature whereby certain hardware interfaces (like PCIe) are able to request direct access to system memory, without going through the CPU.

forest
  • 64,616
  • 20
  • 206
  • 257
  • This seems like it'd require an exhaustive review of the kernel's PCI-handling code, or for someone who helped write it to answer here. That sounds a bit broad to me. Unless you're specifically asking about known vulns? – Nic Jun 28 '19 at 15:34
  • 4
    @NicHartley Naturally I don't need an exhaustive list. An answer providing a single example of low-hanging fruit would suffice, although a more detailed analysis would be nice too. – forest Jul 02 '19 at 21:06

1 Answers1

7

Recently I had been reading the papers published at the NDSSS 2019, and this paper presented in February I think answers the question completely. At the time the question was asked, it seems like the answer to there being vulnerabilities was yes but they have been fixed in the Linux kernel from 5.0. Slides belonging to the paper presented at the Network and Distributed System Security Symposium 2019 can be found here.

The questions is still relevant because of Thunderbolt interfaces also have DMA. The patch notes in Linux can be found here.

LTPCGO
  • 965
  • 1
  • 5
  • 22
  • 1
    Wow, I'm not sure how I missed that paper, especially since I remember the vulnerabilities. I must have completely misunderstood what it was about from the abstract and skipped it. – forest Sep 03 '19 at 02:08
  • It's only about halfway through, from section VII, where Linux is exploitable, but it doesn't make it any less serious, was only submitted from v5. I've added in the slides too, which I had downloaded but found the URL again and added to the post :) – LTPCGO Sep 03 '19 at 02:22
  • Added the patch notes too from https://lore.kernel.org/lkml/20190730045229.3826-9-baolu.lu@linux.intel.com/T/ – LTPCGO Sep 03 '19 at 02:32
  • 2
    The idea that DMA buffers may be allocated that don't align to the 4 KiB boundary is pretty horrifying. I never would have expected that any driver would allocate a DMA buffer smaller than the page size. – forest Sep 03 '19 at 02:33
  • 1
    It would be nice if you could add to your answer a summary of the different vulnerabilities discovered that are relevant to Linux, e.g. exposed writable function pointers, windows smaller than 4 KiB allowing DMA to access portions outside of the intended window (optionally with the window held open), etc. – forest Sep 03 '19 at 02:47
  • 1
    I hope the answer might promote some more interest in the question again and someone who specialises a bit more in this can summarise better - it's outside my expertise and so I doubt I can summarise it succinctly. If nobody has in a couple of days though I will come back to it! Further reading that others might find interesting is this paper from some French researchers here (with some nice explanations) https://link.springer.com/article/10.1186/s13173-017-0066-7 and the Intel doc: https://firmware.intel.com/sites/default/files/Intel_WhitePaper_Using_IOMMU_for_DMA_Protection_in_UEFI.pdf – LTPCGO Sep 03 '19 at 02:55
  • 1
    The [VT-d specs](https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf) from Intel are also useful reading. Also [this paper](https://www.cs.utexas.edu/users/witchel/pubs/zhu17gpgpu-security.pdf) on IOMMU gotchas. – forest Sep 03 '19 at 02:56