pci-stub or vfio-pci being overridden by Nvidia driver

1

I'm having difficulty setting up VGA-passthrough. I'm running an E5-1650v2, an Nvidia GTX 970 to be the host display and a Quadro K4000 that I want to pass through to a VM on an Asus X-99 E-WS. This is Ubuntu 16.04. Vt-d is on.

I have tried binding the K4000 to both pci-stub and vfio-pci, and it doesn't work. On the command line:

intel_iommu=on pci-stub.ids=10de:11fa,10de:0e0b or

intel_iommu=on vfio-pci.ids=10de:11fa,10de:0e0b

And in /etc/initramfs-tools/modules. Both techniques end up with the audio device bound to the stub or the vfio driver, respectively, but the Nvidia driver always grabs the actual display device.

I've tried setting

nvidia id=10de:13c2,10de:0fbb

(which is the 970 card) in /etc/initramfs-tools/modules to see if that would work, but it made no difference.

I've also tried unbinding the card from the command line by echoing the device id to /sys/bus/pci/drivers/nvidia/unbind. That removes it from the ...drivers/nvidia/ directory, but also locks up bash (which goes to 100% of a core and is unkillable).

Is there a way to tell the Nvidia driver to only bind to the one card?

Edit:

To see if there would be different behavior, I tried binding the 970 to vfio instead. The nvidia driver still grabs the video device, but at least I'm seeing a vfio group in /dev/vfio doing that, which I failed to note above wasn't happening before.

I wonder if somehow PCI ID order is involved; the K4000 is 06 and the 970 is 09, and the preboot and boot process displays on the K4000. I don't see any way to tell the BIOS which card to make 'primary', and wonder if, since the BIOS preferred that card, the kernel will refuse to bind the driver to vfio/stub. Which would imply I need to tear it down and reorder the cards.

duplicate_id

Posted 2016-11-27T07:57:21.323

Reputation: 61

Answers

0

This remains a work in progress, but what ended up sort of working was to unbind the one card early. I added a systemd unit file to run:

virsh ondedev-detach pci_0000_08_00_0

To run before the lightdm unit. vfio-pci is then assigned, and I can pass it through normally. I have no idea what is different between using virsh to detach and using /sys/bus/ide/drivers/.../unbind, but virsh doesn't lock up a core.

This is (a) passing the 970 through and (b) using the Nouveau driver; I can't get it to work with the K4000 at all, and haven't tried the Nvidia blob again for want of time. The only reason I can think of for that is that it is a lower PCI id and is used by the BIOS. Tearing the machine down to test that theory is going to have to wait for a bit.

duplicate_id

Posted 2016-11-27T07:57:21.323

Reputation: 61