6

I have an iSCSI HP P2000 loaded with 12 x 300gb DP SAS drives. This is connected to 2 DL385 servers with 64gb RAM running XenServer. The SAN network is multipathed via 2 gigabit switches (4 x copper links from the P2000 to each switch, then each XenServer has a copper link to each switch)

Recently I re-evaluated my install-time decision to create 2 vdisks: a RAID-5 and a RAID-10. I decided the RAID-5 was pointless and destroyed it, then used those disks to extend the RAID-10 into one big array, with 2 hot-spare disks.

Since that was done (just over a week ago) disk performance has been quite horrible; looking at top on both the XenServer hosts show ~15% I/O wait. hdparm inside a VM shows around 1.12MB/s reads from the SAN.

There are 2 other Win2008R2 machines connected to the SAN also. I have previous performance tests showing they too are experiencing a significant difference so I don't believe it is a problem on the XenServer hosts.

  • 8k random reads previously ~24.75MB/s, now 1.67MB/s
  • 64k random reads previously ~170MB/s, now 3.61MB/s

The RAID extension has completed, everything is "OK" in the SMU. The XenServers are quite "empty" (7 VM's on one of them, 4 VM's on the other) but it is a production environment.

fukawi2
  • 5,327
  • 3
  • 30
  • 51

1 Answers1

8

After some more investigation, it turns out that the issue was someone (cough) enabling flow control on the SAN switches on the advice of an internet article talking about the very same hardware and explicitly encouraging Flow Control to be used.

After disabling flow control, Load Average and I/O Wait immediately reduced, and after 24 hours it is completely back to normal (0.0%wa and Load Average < 0.10) once some struggling maintenance tasks completed.

fukawi2
  • 5,327
  • 3
  • 30
  • 51