8

Greetings,

I'm working with RHEL 5.5 guest VMs under VMware ESX 4. When I configure the virtual disks in the VM hardware settings, each disk has a SCSI address in the format "N:M". For example, "1:3" would mean SCSI host number 1 and SCSI target ID 3.

When I look at the disk info from the VM's BIOS or a Windows OS, the detected SCSI address info matches up with the virtual hardware settings. But under Linux, the SCSI address components don't match up, at least not completely or consistently.

I've tried the three supported virtual SCSI and SAS drivers and they all seem to be "broken", but in different ways. Here's a list of the virtual hardware addresses vs what was detected under Linux with each of the drivers:

Driver    vHW Addr  Linux Addr
--------  --------  ----------
LSI SAS   0:0       0:0
LSI SAS   0:3       0:1
LSI SAS   0:6       0:2
LSI SCSI  1:1       2:1
LSI SCSI  1:4       2:4
LSI SCSI  1:7       2:7
pvSCSI    2:2       1:2
pvSCSI    2:5       1:5
pvSCSI    2:8       1:8

My main question is why does this happen under Linux? The next question is: how do I get it fixed or fix it myself?

If I was going to guess, I'd say it's an issue with how the kernel is handing out the SCSI host number and how the Linux SCSI driver (included with VMware tools) is detecting the SCSI target number. Perhaps the order the drivers are loaded also has something to do with the issue. I'm guessing this would not involve udev, but I could be wrong.

Any thoughts would be appreciated. Thanks!

PS. My environment is VMware, but I don't need an answer for these drivers specifically. I imagine this might be a problem with any SCSI driver under Linux.

Chris Sears
  • 363
  • 1
  • 2
  • 7
  • 1
    It's not clear what you're trying to fix here. It's true that Linux and the BIOS often enumerate devices differently, but why is this a problem to you? It's not "broken", this is just how things work. – larsks Jan 11 '11 at 01:26
  • 2
    Like other hardware identifiers, the SCSI address is not open to interpretation. If Linux manages to correctly detect MAC addresses in NICs and the UUID from BIOS, why can't it get the SCSI address right? Being able to map from the OS's view of a disk to the external hardware is critical to safely automating the removal of a disk. If you had two disks of the exact same size, how (other than SCSI addresses) would you know which was which? – Chris Sears Jan 11 '11 at 06:01
  • By the serial number, UUID, or by blinking the HD lights. I can see how the SAS seems qnit broken, but the others seem fine, – Zoredache Jan 11 '11 at 08:16
  • You may find the devlinks under `/dev/disk/by-{id,label,path,uuid}` helpful for your purpose! – MikeyB Jan 25 '11 at 15:44

3 Answers3

4

Linux is actually being consistent and correct, just not necessarily in the way you expect.

LSI SAS: SAS addresses are WWNs and get assigned SCSI-like IDs corresponding to the order in which they're seen. (This is a simplification, but will do. Why do you have gaps anyways?)

LSI SCSI & pvSCSI: SCSI host number relates ONLY to the order in which the host adapter drivers are loaded by the kernel and does not relate to your VMWare assigned numbers. If you like seeing them in the other order, switch the driver load order. Most likely, switch their numbering in /etc/modprobe.conf and reboot.

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • Great info. Thanks. Do you know why Linux would choose to throw away/ignore the SCSI host number and target ID being provided by the virtual hardware and generate new ones based on load order? I don't follow the logic. – Chris Sears Jan 24 '11 at 18:19
  • 1
    The virtual hardware does *not* provide the SCSI host number. Linux is not throwing away the target ID - the only case where it differs is SAS and that's because SAS addresses are long and mapped to an *emulated* (by SCSI emulation layer) target ID. Run: `sg_logs --page=18h /dev/sg0` (where sg0 is the generic device of one of your SAS targets) and you'll see. – MikeyB Jan 25 '11 at 15:41
3

I remove the proper HD by mapping serial numbers to tray caddies. We have enclosures with poor LED abilities. A new disk goes it, say it shows up as /dev/sda

udevadm info -q all -n /dev/sda|grep SERIAL

Then we write down the serial number. Then if a disk goes bad, we look up the serial number (in our case we label the physical caddie) and pull the appropriate disk.

But that doesn't really help you in vmware.

Then again, you could write a script that does the same thing. Add a new disk, record its uuid in the guest, then consult that look up table when you want to automatically remove the disk later.

I haven't really payed attention, but I think my vmware disks have always powered on in the same order. So you might be able to trust that the scsi address won't change if you keep the address the same.

Steven
  • 3,009
  • 18
  • 18
  • Thanks, Steven. Your suggestion about using a script to correlate the vmware/hardware data and OS's data on a new disk is the workaround I plan to implement. It's not the solution I was hoping for, but it will work. – Chris Sears Jan 24 '11 at 18:13
1

Modern Linux rebuilds /dev directory on bootup, and it scans scsi-hosts in the order they appear on the pci bus. In VMware that would be the order in which you add them to the vm.

If you add a disk with scsi 0:1 first, then one with 2:2, in linux they would appear like: 0:1 and 1:2. If you add scsi 1:3 after that, after boot-up it would appear as 2:3.

No need te edit anything in linux, you can change the order of the scsi-hosts in the vmx file:

$ grep pciSlotNumber vm.vmx
scsi0.pciSlotNumber = "16"
scsi2.pciSlotNumber = "34"
scsi1.pciSlotNumber = "35"

The order in which they appear in the vmx-file doesn't matter, just the pciSlotNumber

edit the vmx, and rearrange the slotnumbers so scsi0 gets the lowest number, scsi1 the next lowest and so on. ( Use the same numbers, it's safer. Backup of your vmx too! )

scsi0.pciSlotNumber = "16"
scsi2.pciSlotNumber = "35"
scsi1.pciSlotNumber = "34"

After bootup they will appear in the correct order.

So remember to add your scsi-hosts to the vm in the correct order! Also remember if you delete the last disk on a scsi-host, the scsi-host itself will also be gone by the next reboot. So if you have scsi-host 0,1,2 and 3, and you delete 2, in linux you will end up with only scsi-hosts 0,1 and 2.

mgorven
  • 30,036
  • 7
  • 76
  • 121