There are 3 LUNs on a FC-SAN I want to access using 2 HBAs (with two paths each. When the system is booted, everything seems fine, but after a while the sd*-devices from the second HBA disappeared and I have no Idea why or how to get them back without rebooting. Scanning SCSI-bus still finds all devices, but kernel does not get aware of block-devices. It's Red Hat 6.6 with latest updates.
The same LUNs are available on 4 paths on another system but only on 2 on this one.
Does anyone have a clue what I could be missing?
# lspci|grep Fibre
08:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
08:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
# lsscsi
...
[1:0:0:1] disk DataCore Virtual Disk DCS /dev/sdb
[1:0:0:2] disk DataCore Virtual Disk DCS /dev/sdc
[1:0:0:3] disk DataCore Virtual Disk DCS /dev/sdd
[1:0:1:1] disk DataCore Virtual Disk DCS /dev/sde
[1:0:1:2] disk DataCore Virtual Disk DCS /dev/sdf
[1:0:1:3] disk DataCore Virtual Disk DCS /dev/sdg
[2:0:0:1] disk DataCore Virtual Disk DCS -
[2:0:0:2] disk DataCore Virtual Disk DCS -
[2:0:0:3] disk DataCore Virtual Disk DCS -
[2:0:1:1] disk DataCore Virtual Disk DCS -
[2:0:1:2] disk DataCore Virtual Disk DCS -
[2:0:1:3] disk DataCore Virtual Disk DCS -
...
# rescan-scsi-bus.sh
...
0 new or changed device(s) found.
0 remapped or resized device(s) found.
0 device(s) removed.
This was logged when it happened:
May 24 12:08:57 hostname kernel: sd 1:0:0:1: Parameters changed
May 24 12:08:57 hostname kernel: sd 1:0:1:3: Parameters changed
May 24 12:09:01 hostname kernel: sd 1:0:1:2: Parameters changed
May 24 12:09:24 hostname kernel: sd 1:0:1:1: Parameters changed
May 24 12:09:24 hostname kernel: sd 2:0:0:1: rejecting I/O to offline device
May 24 12:09:25 hostname multipathd: checker failed path 8:112 in map lun0
May 24 12:09:25 hostname multipathd: ora_data2: remaining active paths: 3
May 24 12:09:25 hostname multipathd: checker failed path 8:128 in map lun1
May 24 12:09:25 hostname multipathd: ora_acfs1: remaining active paths: 3
May 24 12:09:25 hostname multipathd: checker failed path 8:144 in map lun2
May 24 12:09:25 hostname multipathd: ora_acfs2: remaining active paths: 3
May 24 12:09:25 hostname multipathd: checker failed path 8:160 in map lun0
May 24 12:09:25 hostname multipathd: ora_data2: remaining active paths: 2
May 24 12:09:25 hostname multipathd: checker failed path 8:176 in map lun1
May 24 12:09:25 hostname multipathd: ora_acfs1: remaining active paths: 2
May 24 12:09:25 hostname multipathd: checker failed path 8:192 in map lun2
May 24 12:09:25 hostname multipathd: ora_acfs2: remaining active paths: 2
May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:112.
May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:128.
May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:144.
May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:160.
May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:176.
May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:192.
Unfortunately, I have no access to the SAN-device but I'm being told nothing was touched.
I've just seen that the devices whre in fact gone but came back 2 hours later:
May 24 14:06:35 hostname kernel: scsi 2:0:1:1: Attached scsi generic sg9 type 0
May 24 14:06:35 hostname kernel: scsi 2:0:1:2: Attached scsi generic sg10 type 0
May 24 14:06:35 hostname kernel: scsi 2:0:1:3: Attached scsi generic sg11 type 0
May 24 14:06:37 hostname kernel: scsi 2:0:0:1: Attached scsi generic sg12 type 0
May 24 14:06:37 hostname kernel: scsi 2:0:0:2: Attached scsi generic sg13 type 0
May 24 14:06:37 hostname kernel: scsi 2:0:0:3: Attached scsi generic sg14 type 0
It is possible that the FC-switch in between was switched off in that time. When the system booted previously and the sd-devices were created as usual, the line slightly differs:
May 24 11:14:15 hostname kernel: sd 2:0:1:3: Attached scsi generic sg14 type 0
vs.
May 24 14:06:35 hostname kernel: scsi 2:0:1:1: Attached scsi generic sg9 type 0
It says "scsi" instead of "sd".