2

I have some tasks I only want to run on machines that have NVIDIA GPUs. Is there a good way with Puppet to be able to determine if a specific agent has an NVIDIA GPU or not? I'm able to do it in bash by checking to see if /usr/bin/nvidia-smi exists, but I'm not sure how I should do this in Puppet. Also if there's a better way to do it in bash instead of this way, please let me know.

AndreasKralj
  • 321
  • 1
  • 4
  • 15

2 Answers2

3

You should create a custom fact that either checks the existence of /usr/bin/nvidia-smi (if that's sufficient), with something like:

Facter.add(:nvidia_gpu) do
  confine :kernel => 'Linux'
  setcode do
    FileTest.executable?('/usr/bin/nvidia-smi')
  end
end

or perhaps to be more thorough checks to see if a particular PCI device exists, if it shows up as one, using either the output of lspci or walking the /sys/bus/pci directory.

In your Puppet manifests, you can then use the value of $facts['nvidia_gpu'] to control what you do.

bodgit
  • 4,661
  • 13
  • 26
  • Gotcha. So if I were to do the lspci method, would I be able to grep for NVIDIA and then if it exists, I know that the machine has an NVIDIA GPU? If so, how can I do this in Puppet instead of bash, using Facter if possible. – AndreasKralj Feb 07 '19 at 18:10
  • 1
    You would need to replace the `FileTest...` line in the above example with whatever logic you want to use, which would need to be written in Ruby. See the docs I linked to which show how to write and test your custom fact. – bodgit Feb 07 '19 at 18:27
  • 1
    @AndreasKralj It depends on why you are doing it. Looking for `/usr/bin/nvidia-smi` tells you if the NVIDIA tools are installed, but it doesn't tell you whether the machine has an NVIDIA GPU. On Linux it is possible to install the drivers and tools on a machine that doesn't even have an NVIDIA GPU. If you need to decide which machines need to have the drivers installed, then you should look at the PCI IDs of the installed hardware as bodgit mentioned. – Michael Hampton Feb 07 '19 at 20:43
  • Yeah I ended up doing `system("lspci | grep 'NVIDIA' > /dev/null")` instead of `FileTest` to verify if any NVIDIA devices existed, it worked for what I needed it for. – AndreasKralj Feb 07 '19 at 21:23
0

One can modify the pci_devices fact to detect the GPU is installed in the computer. It uses lspci instead of looking for toolkits, so can be used to install the toolkits with puppet.

# Copyright: Pieter Lexis <pieter@kumina.nl>

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# There are no dependencies needed for this script, except for lspci.
# This script is only tested on Debian (Lenny and Squeeze), if you
# have any improvements, send a pull request, ticket or email.
# The latest version of this script is available on github at
# https://github.com/kumina/fact-pci_devices

def add_fact(fact, code)
  Facter.add(fact) { setcode { code } }
end

case Facter.value(:operatingsystem)
  when /Debian|Ubuntu/i
    lspci = "/usr/bin/lspci"
  when /RedHat|CentOS|Fedora|Scientific|SLES/i
    lspci = "/sbin/lspci"
  else
    lspci = ""
end

# We can't do this if we don't know the location of lspci
if !lspci.empty? and FileTest.exists?(lspci)
  # Create a hash of ALL PCI devices, the key is the PCI slot ID.
  # { SLOT_ID => { ATTRIBUTE => VALUE }, ... }
  slot=""
  # after the following loop, devices will contain ALL PCI devices and the info returned from lspci
  devices = {}
  %x{#{lspci} -v -mm -k}.each_line do |line|
    if not line =~ /^$/ # We don't need to parse empty lines
      splitted = line.split(/\t/)
      # lspci has a nice syntax:
      # ATTRIBUTE:\tVALUE
      # We use this to fill our hash
      if splitted[0] =~ /^Slot:$/
        slot=splitted[1].chomp
        devices[slot] = {}
      else
        # The chop is needed to strip the ':' from the string
        devices[slot][splitted[0].chop] = splitted[1].chomp
      end
    end
  end

  # To create your own facts, edit the following code:
  raid_counter = 0
  raidcontrollers = []
  gpus = {}
  scsicontrollers = {}
  devices.each_key do |a|
    case devices[a].fetch("Class")
    when /^RAID/
      # ignore AHCI "fake" RAID, because we don't use it
      if devices[a].fetch('Driver') != "ahci"
        add_fact("raidcontroller_#{raid_counter}_vendor", "#{devices[a].fetch('Vendor')}")
        add_fact("raidcontroller_#{raid_counter}_model", "#{devices[a].fetch('SDevice')}")
        raid_counter += 1
        raidcontrollers.insert(-1,"#{devices[a].fetch('Driver')}")
      end
    when /^3D/
       if gpus.key?("#{devices[a].fetch('Device')}")
         gpus["#{devices[a].fetch('Device')}"]['count'] += 1
       else
         gpus["#{devices[a].fetch('Device')}"] = {
           'count' => 1, 
           'vendor' => "#{devices[a].fetch('Vendor')}",
         }
         # Driver might not be defined
         if devices[a].key?('Driver')
           gpus["#{devices[a].fetch('Device')}"]['driver'] = "#{devices[a].fetch('Driver')}"
         end
       end
    when /.*SCSI controller.*/
       if scsicontrollers.key?("#{devices[a].fetch('Device')}")
         scsicontrollers["#{devices[a].fetch('Device')}"]['count'] += 1
       else
         scsicontrollers["#{devices[a].fetch('Device')}"] = {
           'count' => 1, 
           'vendor' => "#{devices[a].fetch('Vendor')}",
           'driver' => "#{devices[a].fetch('Driver')}"
         }
       end
    end
  end
  add_fact("raidcontrollers", raidcontrollers.join(","))
  add_fact("gpus", gpus)
  add_fact("scsicontrollers", scsicontrollers)
end
lickdragon
  • 151
  • 2
  • 9