2

This question also asked at Citrix Forums http://forums.citrix.com/thread.jspa?threadID=332289

I have a MD3200i that is currently working fine with my Xen5.6 pool, but I cannot get a connection to the new 6.2 pool to work. I previously had a problem with a 6.0 upgrade (which is why the old pool is still on 5.6), but rolled back rather than fix it as it wasn't urgent at the time.

This install is on new machines - I tried 6.1 first (which had the same problems) then 6.2 was released the second day after installation so I switched to that.

I've not installed anything from the Dell resource DVD at this point - I can't find anything saying I should, and everything I have read suggests it shouldn't be necessary.

I can ping all 8 ip addresses from both servers in the pool, iscsiadm -m discovery works fine, I can login to the nodes and iscsiadm reports the sessions active correctly.

I've added the required sections to multipath.conf, but multipath -ll reports DM multipath kernel driver not loaded immediately after boot.

The following is a log of a test session immediately after boot.

root@xen3 ~]# iscsiadm -m node --loginall=all
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.101,3260]
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.101,3260]
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.104,3260]
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.102,3260]
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.103,3260]
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.104,3260]
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.102,3260]
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.103,3260]
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.101,3260]: successful
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.101,3260]: successful
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.104,3260]: successful
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.102,3260]: successful
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.103,3260]: successful
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.104,3260]: successful
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.130.102,3260]: successful
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91, portal: 192.168.131.103,3260]: successful                                                                                                                                               

[root@xen3 ~]# iscsiadm -m session                                                                                                                
tcp: [1] 192.168.130.101:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          
tcp: [2] 192.168.131.101:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          
tcp: [3] 192.168.131.104:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          
tcp: [4] 192.168.131.102:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          
tcp: [5] 192.168.130.103:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          
tcp: [6] 192.168.130.104:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          
tcp: [7] 192.168.130.102:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          
tcp: [8] 192.168.131.103:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91                                          

[root@xen3 ~]# service multipathd restart
ok                                                                                                                                                
Stopping multipathd daemon:                                [  OK  ]                                                                               
Starting multipathd daemon:                                [  OK  ]                                                                               

[root@xen3 ~]# multipath                                                                                                                          
Jul 04 09:58:47 | DM multipath kernel driver not loaded                                                                                           
Jul 04 09:58:47 | DM multipath kernel driver not loaded                                                                                           
[root@xen3 ~]# multipath -ll
Jul 04 09:59:03 | DM multipath kernel driver not loaded                                                                                           
Jul 04 09:59:03 | DM multipath kernel driver not loaded                                                                                           
[
root@xen3 ~]# modprobe dm_multipath                                                                                                              

[root@xen3 ~]# multipath
Jul 04 10:19:50 | 36b8ca3a0e7024800194a0bd11891cd14: ignoring map                                                                                 
create: 1Dell_Internal_Dual_SD_0123456789AB undef Dell,Internal Dual SD
size=1.9G features='0' hwhandler='0' wp=undef
`-+- policy='round-robin 0' prio=1 status=undef
  `- 7:0:0:0  sdb 8:16  undef ready  running

[root@xen3 ~]# multipath -ll
1Dell_Internal_Dual_SD_0123456789AB dm-1 Dell,Internal Dual SD
size=1.9G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 7:0:0:0  sdb 8:16  active ready  running

[root@xen3 ~]# iscsiadm -m session
tcp: [1] 192.168.130.101:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
tcp: [2] 192.168.131.101:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
tcp: [3] 192.168.131.104:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
tcp: [4] 192.168.131.102:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
tcp: [5] 192.168.130.103:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
tcp: [6] 192.168.130.104:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
tcp: [7] 192.168.130.102:3260,2 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
tcp: [8] 192.168.131.103:3260,1 iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91

[root@xen3 ~]# dmesg | tail -n 50
[ 1161.881010] sd 8:0:0:0: [sdf] Unhandled error code
[ 1161.881013] sd 8:0:0:0: [sdf] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[ 1161.881017] sd 8:0:0:0: [sdf] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 1161.881024] end_request: I/O error, dev sdf, sector 0
[ 1161.881031] Buffer I/O error on device sdf, logical block 0
[ 1161.881045] sd 15:0:0:0: [sdi] Unhandled error code
[ 1161.881048] sd 15:0:0:0: [sdi] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[ 1161.881052] sd 15:0:0:0: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 1161.881058] end_request: I/O error, dev sdi, sector 0
[ 1161.881065] Buffer I/O error on device sdi, logical block 0
[ 1161.881122] sd 9:0:0:0: [sdg] Unhandled error code
[ 1161.881124] sd 9:0:0:0: [sdg] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[ 1161.881126] sd 9:0:0:0: [sdg] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 1161.881132] end_request: I/O error, dev sdg, sector 0
[ 1161.881140] Buffer I/O error on device sdg, logical block 0
[ 1168.220951]  connection6:0: ping timeout of 15 secs expired, recv timeout 10, last rx 84060, last ping 85060, now 86560
[ 1168.220957]  connection7:0: ping timeout of 15 secs expired, recv timeout 10, last rx 84060, last ping 85060, now 86560
[ 1168.220967]  connection7:0: detected conn error (1011)
[ 1168.220969]  connection4:0: ping timeout of 15 secs expired, recv timeout 10, last rx 84060, last ping 85060, now 86560
[ 1168.220973]  connection4:0: detected conn error (1011)
[ 1168.220975]  connection3:0: ping timeout of 15 secs expired, recv timeout 10, last rx 84060, last ping 85060, now 86560
[ 1168.220978]  connection3:0: detected conn error (1011)
[ 1168.220985]  connection6:0: detected conn error (1011)
[ 1168.480994] sd 14:0:0:0: [sde] Unhandled error code
[ 1168.480998] sd 14:0:0:0: [sde] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[ 1168.481001] sd 14:0:0:0: [sde] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 1168.481009] end_request: I/O error, dev sde, sector 0
[ 1168.481015] Buffer I/O error on device sde, logical block 0
[ 1168.481076] sd 11:0:0:0: [sdc] Unhandled error code
[ 1168.481078] sd 11:0:0:0: [sdc] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[ 1168.481080] sd 11:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 1168.481087] end_request: I/O error, dev sdc, sector 0
[ 1168.481092] Buffer I/O error on device sdc, logical block 0
[ 1168.481144] sd 10:0:0:0: [sdd] Unhandled error code
[ 1168.481147] sd 10:0:0:0: [sdd] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[ 1168.481150] sd 10:0:0:0: [sdd] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 1168.481156] end_request: I/O error, dev sdd, sector 0
[ 1168.481163] Buffer I/O error on device sdd, logical block 0
[ 1168.481168] sd 13:0:0:0: [sdj] Unhandled error code
[ 1168.481170] sd 13:0:0:0: [sdj] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[ 1168.481172] sd 13:0:0:0: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[ 1168.481178] end_request: I/O error, dev sdj, sector 0
[ 1168.481184] Buffer I/O error on device sdj, logical block 0
[ 1457.105996] device-mapper: multipath round-robin: version 1.0.0 loaded
[ 1457.106155] device-mapper: multipath: Cannot access device path 8:0: -16
[ 1457.106164] device-mapper: table: 252:1: multipath: error getting device
[ 1457.106172] device-mapper: ioctl: error adding target to table
[ 1457.171292] device-mapper: multipath: Cannot access device path 8:0: -16
[ 1457.171299] device-mapper: table: 252:1: multipath: error getting device
[ 1457.171304] device-mapper: ioctl: error adding target to table

[root@xen3 ~]# fdisk -l

Disk /dev/sda: 299.4 GB, 299439751168 bytes
255 heads, 63 sectors/track, 36404 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1           5       40131   de  Dell Utility
/dev/sda2   *           6         528     4194304   83  Linux
Partition 2 does not end on cylinder boundary.
/dev/sda3             528        1050     4194304   83  Linux
/dev/sda4            1050       36404   283986359+  8e  Linux LVM

Disk /dev/sdb: 2040 MB, 2040528896 bytes
255 heads, 63 sectors/track, 248 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         248     1992028+  83  Linux

Disk /dev/dm-1: 2040 MB, 2040528896 bytes
255 heads, 63 sectors/track, 248 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

     Device Boot      Start         End      Blocks   Id  System
/dev/dm-1p1               1         248     1992028+  83  Linux

[root@xen3 ~]# xe sr-probe type=lvmoiscsi device-config:target=192.168.130.101 device-config:targetIQN=iqn.1984-05.com.dell:powervault.md3200i.6782bcb0006bd850000000004ed88b91
Error code: SR_BACKEND_FAILURE_107
Error parameters: , The SCSIid parameter is missing or incorrect, <?xml version="1.0" ?>
<iscsi-target/>

Note: the xml ends there correctly on the last line - it doesn't ever return a list of LUNs (and there is one in the group on the SAN for those servers.

Tom Sparrow
  • 81
  • 1
  • 5

1 Answers1

2

Looking around various posts elsewhere someone suggested turning off flow control on the switch (the clue apparently being large pause numbers shown in the ethernet stats for those ports) which didn't help but did get me looking at the jumbo frames support.

ping 192.168.130.101 -s 6000 -M do (large packets, do not fragment) failed silently, whereas -s 9500 -M do reported a ICMP error (as I'd expect). Combined with the timeout messages in the log this looked to be the problem.

Switch settings all seemed fine, so checking the XenCenter config again I noticed that the VLAN Network for iSCSI had MTU=9000, but the underlying NIC was still set to 1500. Apparently this not only messes up the jumbo frames (which is reasonable), but doesn't produce the ICMP errors (which seemed a little wrong to me) so the traffic never reaches the SAN once it the packets pass 1500 and no errors or replies are received.

Lesson learnt - make sure the top level network (VLAN or bond the same I assume) are always no higher MTU than the networks they run over.

Tom Sparrow
  • 81
  • 1
  • 5