3

In our environment, we have several storages cabinets connected to RH Linux servers. Depending on the storage cabinet connected to the host, some LUNs are seen using SCSI protocol version 2 (version=0x02 [SCSI-2] ), others are seen with protocol version 4. (version=0x04 [SPC-2]).

Where is this protocol version configured ? Is this on Operating System side ? or on the Storage side ? We have installed the RH servers using exactly the same way. We open a case at RHEL and to our storage vendor, of course RHEL says it is the storage and the storage vendor tells us it is the OS.

This has an impact on LUN discovery (LUN ids are not in sequence --> you need to manually tell scsi-rescan the range of LUN ids you want to scan --> not possible to view all the LUNs at reboot without manual intervention).

We don't know where to look any more, does someone have an ID to where to look? Bellow is the output of sg_inq on 3 different servers.

[qualification:root@xxxxxxxx:/root]$ sg_inq /dev/sda
standard INQUIRY:
PQual=0 Device_type=0 RMB=0 version=0x02 **[SCSI-2]**
[AERC=0] [TrmTsk=0] NormACA=0 HiSUP=1 Resp_data_format=2
SCCS=0 ACC=0 TPGS=0 3PC=0 Protect=0 BQue=0
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=1
[RelAdr=0] WBus16=0 Sync=0 Linked=0 [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=184 (0xb8) Peripheral device type: disk
Vendor identification: HITACHI
Product identification: DF600F
Product revision level: 0000
Unit serial number: 850531780000


[root@ccccccccccc ~]# sg_inq /dev/sda
standard INQUIRY:
PQual=0 Device_type=0 RMB=0 version=0x04 **[SPC-2]**
[AERC=0] [TrmTsk=0] NormACA=0 HiSUP=1 Resp_data_format=2
SCCS=0 ACC=0 TPGS=0 3PC=0 Protect=0 BQue=0
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=1
[RelAdr=0] WBus16=0 Sync=0 Linked=0 [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=184 (0xb8) Peripheral device type: disk
Vendor identification: HITACHI
Product identification: DF600F
Product revision level: 0000
Unit serial number: 8505035001DA

[pre-prod:root@vvvvvvvvv:/home/a143524]$ sg_inq /dev/sda
standard INQUIRY:
PQual=0 Device_type=0 RMB=0 version=0x04 **[SPC-2]**
[AERC=0] [TrmTsk=0] NormACA=0 HiSUP=1 Resp_data_format=2
SCCS=0 ACC=0 TPGS=0 3PC=0 Protect=0 BQue=0
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=1
[RelAdr=0] WBus16=0 Sync=0 Linked=0 [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=184 (0xb8) Peripheral device type: disk
Vendor identification: HITACHI
Product identification: DF600F
Product revision level: 0000
Unit serial number: 850503500032

The driver is the default qla module comming with rhel We do not change a lot of parameters:

options qla2xxx qlport_down_retry=1 ql2xplogiabsentdevice=1 ql2xmaxqdepth=16

which result as :

[qualification:root@xxxxxxxx]$ for i in  /sys/module/qla2xxx/parameters/*;
do
echo $i;cat $i;
done

/sys/module/qla2xxx/parameters/ql2xallocfwdump
1

/sys/module/qla2xxx/parameters/ql2xdbwr
1

/sys/module/qla2xxx/parameters/ql2xdevdiscgoldfw
0

/sys/module/qla2xxx/parameters/ql2xdontresethba
0

/sys/module/qla2xxx/parameters/ql2xenablemsix
1

/sys/module/qla2xxx/parameters/ql2xetsenable
0

/sys/module/qla2xxx/parameters/ql2xextended_error_logging
1

/sys/module/qla2xxx/parameters/ql2xfdmienable
0

/sys/module/qla2xxx/parameters/ql2xfwloadbin
0

/sys/module/qla2xxx/parameters/ql2xloginretrycount
30

/sys/module/qla2xxx/parameters/ql2xlogintimeout
20

/sys/module/qla2xxx/parameters/ql2xmaxqdepth
16

/sys/module/qla2xxx/parameters/ql2xplogiabsentdevice
1

/sys/module/qla2xxx/parameters/ql2xqfullrampup
120

/sys/module/qla2xxx/parameters/ql2xqfulltracking
1

/sys/module/qla2xxx/parameters/ql2xshiftctondsd
6

/sys/module/qla2xxx/parameters/ql2xtargetreset
1

/sys/module/qla2xxx/parameters/qlport_down_retry
1

Another thing that let me think to a linux problem is : The following output is different on evry host : SCSI revision give different results

cat /proc/scsi/scsi:
...

Host: scsi1 Channel: 00 Id: 04 Lun: 99
  Vendor: HITACHI  Model: DF600F           Rev: 0000
  Type:   Direct-Access                    ANSI SCSI revision: 02

But With SCLI i found always the same output SBC-2 :

LUN 99
---------------------------------------
Product Vendor                 : HITACHI 
Product ID                     : DF600F          
Product Revision               : 0000
LUN                            : 99
Size                           : 100.00 GB
Type                           : SBC-2 Direct access block device
                       (e.g., magnetic disk)
WWULN                          : 48-49-54-41-43-48-49-20-38-35-30-35-32-38-39-30
                       30-30-39-39
OS LUN Name                    : /dev/sdiz;/dev/sg259;

Does this give some idea to anyone? Regards Mike

27/10/2011 UPDATE:

Hi we recently made two interesting tests:

  • presenting a Lun from the same storage to another hosts (this test was obvious since we had the same issue on the 3 member of a RAC cluster)

--> Ths SCSI revision was OK

  • presenting a Lun from another storage to the problematic hosts

--> The scsi revision was OK on this host

We have notice that these 3 RAC nodes have a lot of disks on different storage... Since one storage has to be decomissioned, we will first clean up this before going further...

We decide also to implement a scsi-rescan in the boot sequence to be able to reboot the machine without issue (I hate that kind of work around)

I'll keep your other proposal for the future ;)

I'll keep you posted on this Regards

Mike
  • 61
  • 6
  • I'd think this is at least influenced by the driver implementation. What drivers are you using? Same ones across all machines? What are the module's startup parameters? – the-wabbit Oct 09 '11 at 21:07
  • I edit my post to add the drivers information – Mike Oct 10 '11 at 16:39

2 Answers2

1

The protocol version is a property of the drive, or, emulation layer sitting between the drive(s) and the host. If you have a cabinet that performs RAID functions and presents a single LUN representing multiple deivces, or some configurable slice, then it is the raid layer that defines what SCSI protocol version it speaks.

psusi
  • 3,247
  • 1
  • 16
  • 9
  • I really like that answer,'coz it sounds logic to me, but our SAN administrator agree with his vendors who says that it is not... so how to prove it? – Mike Oct 11 '11 at 11:37
  • @Mike, The proof is in the pudding. If you put the same drives in another enclosure connected to the same controller, in the same server, with the same OS, and it claims to speak a different SCSI version, then it can be nothing else but the enclosure. – psusi Oct 11 '11 at 15:31
1

Hmm, I've seen that sg_inq is not listing the entire version information upon calling without parameters. If you need a listing of all standards the device is claiming to comply with, you should be using sg_inq -d /dev/sda - chances are that you would get identical output on all your hosts.

On the other hand, whatever the device is claiming, it is not necessarily what you are using - the negotiated protocol properties might differ.

Since your LUNs are discovered out-of-order, you could try looking hard at the Fast!Util configuration options for possible differences among your configurations. It might also be worth it inquiring the QLogic (or your hardware manufacturer if you have OEMed HBAs) support about possible causes for the issue.

Edit: Your problem seems tricky - some shots in the dark might help for no apparent reason or push you a few steps towards resolution.

  • try different drivers - you are probably using QLA 23xx/24xx FC HBAs but using older qla2xxx drivers - try to replace them with qla2300 / qla2400 specific modules and see if this makes any difference
  • try factory-resetting your HBA configuration on a problematic machine and on a machine working well and see if it makes any difference
  • if you have this option, use another FC HBA (maybe just acquire an oldish Emulex adapter on eBay for $50) for testing and seeing if it will change anything
  • start the system with a different OS version - for example a live sysrescuecd version to see if the problem is reproducible with different kernel/module versions
the-wabbit
  • 40,319
  • 13
  • 105
  • 169
  • sg_inq -d /dev/sda No version descriptors available – Mike Oct 11 '11 at 09:09
  • Thirst Thanks a lot for your answer : The only thing the -d gave me is : sg_inq -d /dev/sda --> "No version descriptors available", What do you mean about OutOforder? Vendors (HP/HDS/Rhel) don't help as said in the initial question... – Mike Oct 11 '11 at 11:34
  • Edited the answer with some additional thoughts – the-wabbit Oct 11 '11 at 12:22
  • I update the original post... – Mike Oct 27 '11 at 13:01