How to disable a plugged-in PCI-e graphic card on OS level?

4

2

I have a server running for which I want to have a possility to access it with a screen connected to VGA (very rarely, most of the time it's accessed via SSH). Therefore I have a PCI-e graphic cards plugged in and the VGA cable removed. The graphic card is passively cooled and if I open the case and touch the cool body I can feel a noticable warmth and conclue that it's consuming energy (there're no consumers close to it that could transmit the thermal energy in any way).

If I unplug the card (as suggested in Should I Disable an unused graphic card?) I have to built it in every time I want to connect a screen. I would like to avoid that as well as the energy consumption.

The de- and re-activation needs to take place on the OS level (e.g. via SSH) because otherwise I'd need a screen to configure the UEFI (or do that blindly which is no alternative) and run into a chicken-egg-problem.

I'm using Ubuntu 15.04 with Linux 4.0.2. The graphic card is labeled XFX HD 5450 850M and has a VGA, HDMI and D-SUB connector. The mainboard is an ASRock X99-Extreme without integrated graphic.

EDIT: After blacklisting used modules listed in sudo lspci -v (following @WhimsicalWombat's promising answer below) (in my case I had to use the modprobe.blacklist=module_to_blacklist kernel parameter - see https://askubuntu.com/questions/110341/how-to-blacklist-kernel-modules for more details - for radeon and snd_hda_intel) the PCIe graphic card still heats up (passive cooler has > 60 degrees and there's no heat source closeby) and no modules are listed in lspci -v for the devices. The output of lspci -v for the devices is not

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series] (prog-if 00 [VGA controll
er])
        Subsystem: XFX Pine Group Inc. Device 303e
        Physical Slot: 4
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at fbe20000 (64-bit, non-prefetchable) [size=128K]
        I/O ports at e000 [size=256]
        Expansion ROM at fbe00000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting

01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300 Series]
        Subsystem: XFX Pine Group Inc. Device aa68
        Physical Slot: 4
        Flags: bus master, fast devsel, latency 0, IRQ 10
        Memory at fbe40000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting

EDIT 2: https://askubuntu.com/a/138953/173287 suggests to do echo 0 > /sys/bus/pci/slots/$N/power and I have a /sys/bus/pci/slots/$N where $N is number of the slot listed in lspci -v, but there's no power file.

EDIT 3: adding the modaliases of /sys/bus/pci/devices/[device]/modalias to modprobe.blacklist= kernel parameters has no effect on Ubuntu mainline kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and a custom 4.0.5 build with make localmodconfig and activation of all PCI options causes the "Loading initramfs [version]" screen to remain visible for ever while the graphic card is still heating up.

Karl Richter

Posted 2015-05-14T15:19:27.450

Reputation: 1 641

Question was closed 2015-05-23T21:59:21.963

Do you use the graphics card's power at any point? GPU intensive programs or VM's? – Ctrl-alt-dlt – 2015-05-14T15:24:35.517

How can I monitor this? I didn't install any additional drivers. Is Linux providing such functionality out-of-the-box? – Karl Richter – 2015-05-14T15:38:30.337

Perhaps also see Can I fully disable my PCIe Video/Graphics Card per BIOS/Software?

– Ƭᴇcʜιᴇ007 – 2015-05-14T15:38:48.723

@Ƭᴇcʜιᴇ007 I edited to match that idea in the question. That explains why the mark as duplicate is invalid imo (the (unconfirmed) answer (which hardly is one btw) refers to manipulations in the BIOS/EFI). – Karl Richter – 2015-05-14T15:43:16.247

1

I'm not sure which GPU you have installed but you might be interested in ZeroCore Power savings

– MonkeyZeus – 2015-05-14T15:50:42.903

Just because the existing answers aren't satisfactory to you doesn't make it a different question. Also the ones quotes specify controlling it with Software (as you are requesting). So to me (at least), it's the same question, and if you'd like newer/better answers to existing questions consider placing a bounty on them.

– Ƭᴇcʜιᴇ007 – 2015-05-14T16:52:19.080

1

Since unloading doesn't cut it, are you missing power file in /sys/bus/*even with kernel drivers loaded? You may want to check it with both opensource radeon and AMD's fglrx drivers. If no luck, try AMD's forum for more accurate info on card's power management. Worst case scenario is yanking it out or upgrading to something like 7730 (which supports zerocore) or NV's equivalent. Or if you want to really dig into this, PCI-e specs are available at PCI-SIG if you want to try and force the card into low-power state. Fun stuff if you have the time

– WhimsicalWombat – 2015-06-14T07:41:02.503

Answers

4

Easiest way is to blacklist and unload it's kernel module. You can see it's current module with first checking out the bus number with sudo lspci |egrep -i (vga|video) noting the first field with number like 01:00.0. Then sudo lspci -vs 01:00 |grep modules displays the module in use. For HDMI-enabled devices there's usually a subdevice like 01:00.1 which is the HDMI audio device. Blacklist that too.

Add both of those /etc/modprobe.d/blacklist or blacklist.conf preceded by blacklist command. For example "blacklist radeon" and "blacklist "snd-hda-intel" etc.

If you want to use the device you can just sudo modprobe [module name] to enable it.

If that's not enough or if you use the same module for the GPU (integrated or second adapter) you actually use, bind the one you want to disable to pci-stub driver. Best way to do this is from kernel command line at boot. Just add pcistub="pci-stub.ids=..." followed by vendorID:deviceID codes you can find with lcpci -nns (your bus number from above) pcistub="pci-stub.ids=1002:6718,1002:aa80"

If you want to get that to use after binding it to pcistub, you can unbind it via sysfs and rebind it to driver of your choice or (simpler) reboot and boot without the commandline.

I'd try first if just disabling the module autoload is enough if that's available.

EDIT:

It's possible kernel will try next available driver if you blacklist one. If so, you can blacklist that too unless you need it. It'll run out of compatible drivers soon and leave the device without driver and it should be powered down (or at least low).

If it's heating up even without a kernel driver in use, please update the post. I'd be very interested to hear that.

WhimsicalWombat

Posted 2015-05-14T15:19:27.450

Reputation: 181

I tried your solution. I learned a lot and it looks promising, thanks. However, the case that the device still heats up with no drivers in use happens now. – Karl Richter – 2015-06-13T11:57:03.850

@KarlRichter Thanks for the update. I don't know how to avoid the power draw (or if it indeed can be avoided.). It's possible that the older cards don't have lower power modes at all. AMD's ZeroCore was introduced with Southern Islands (which lowers idle power consumption to minimum) but I hoped that leaving the card without driver would archieve something similar. – WhimsicalWombat – 2015-06-14T07:14:43.203

The zerocore driver looks like a good workaround, but still consumes ~3 W and one has to figure out the driver installation which then only works for specific set of AMD graphic cards (possibly new or expensive ones only). Initially I was thinking that it's rather trivial to turn a PCIe slot on/off on OS level and thus a large set of power consumers. – Karl Richter – 2015-06-14T09:39:44.860

@KarlRichter ZeroCore is usable only in GCN 1.0 (or 1.1) GPUs ie. 7000 series onwards. 7730 was 50ish bucks new couple of years ago so it wouldn't be expensive now, especially used. But still, I'd try messing out with radeon and fglrx drivers first to see if you can coax your current device into deeper suspend state. – WhimsicalWombat – 2015-06-14T12:45:17.927

I was only able to blacklist radeon by running sudo update-initramfs -u to rebuild the kernel (as detailed here: https://askubuntu.com/a/938663/115620). Un-blacklisting did not require another kernel rebuild.

– Chris Gregg – 2018-08-29T01:02:38.663