1

I have a HP Proliant DL380 G7 w/ HP SmartArray P812 W 1G-BBWC, This is plugged into a D2600 Storage enclosure with 1 mini sas cable. All firwmware versions are the latest (including the disks). There is also the internal backplane plugged into the internal SAS port.

There is one RAID 5 storage array (Across 3 * 4TB SATA disks) and three RAID 1 Arrays, across 1TB SATA disks. Additionally, there are internal SAS 2.5 inch disks connected to the internal port of the controller. 3 X 300GB Raid 5 and 2 X 300GB RAID 1. This problem seems to affect both "internal" disks and disks in the D2600 enclosure.

I am having some very weird performance issues on this system that I cannot track down.

The server is running ESXi 6 from an internal HP Enterprise USB storage device.

With low disk load, no problem. Here is where the issues start. If I copy a benchmark file from one disk array to another, It initially starts out at 250mb/s for a random amount of time (between 10 and 45 seconds). After this, disk IO drops considerably and becomes very random. (see screenshot).

file transfer HD tune graph

If the IO load continues, eventually the transfer drops to 0, and the array stops responding entirely.

Simultaneously the ESX host logs the following:

Device naa.bla performance has deteriorated. I/O latency increased from average value of 5134 microseconds to 434632 microseconds.

A Linux box on the same server shows the following results :

enter image description here

Noteable is the 1800ms latency!

If the array stop responding entirely, the only way to recover is to restart the host. This occurs across all arrays, doesn't matter if its internal or external. I have tried a second D2600 and a different SAS cable. No change. Disabling Windows write Cache or the disk cache on the drives themselves makes no difference.

I am completely stuck at this stage and tearing my hair out, any help would be much appreciated!

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Are the disks in the D2600 actual HP disks? SATA is a poor choice for this type of expander backplane. So is RAID 5 with so few disks and that type of capacity. – ewwhite Aug 22 '16 at 11:18
  • No, they are not HP disks. The array is purely a slow large storage array, hence the choice of Raid 5, is there a better option? – Alexander Fitu Aug 22 '16 at 12:12
  • Using 3rd-party SATA disks in HP equipment [isn't always a good idea](http://serverfault.com/questions/500173/hp-4tb-sata-midline-in-d2600-what-disk-make-is-hp-using-why-wd-re4-dont-wor). SATA disks, in particular, will exhibit weird behavior in the setup you've described. In terms of RAID5, it's not that desirable for disks target than 2TB nowadays. – ewwhite Aug 22 '16 at 12:19
  • @ewwhite True, I do know of issues, there are some SAS disks in this array with the same issues. This is a dev/test environment so reliability is not at the top of the list (but it does have to work :) ) – Alexander Fitu Aug 22 '16 at 14:03
  • You should update your OS. – ewwhite Aug 22 '16 at 16:44

1 Answers1

1

You're running an HP DL380 G7, which should have an internal Smart Array P410 array controller.

  • Can you post the VMware ESXi build number? The driver and HPSA version matter. An update may be necessary.
  • I'd suggest using the P410 for your internal disks and keeping the P812 for your external enclosure.
  • You should also be using SAS disks and dual-domain cabling for the D2600 (2 cables/multipath).
  • The P812 has a SAS expander embedded in it. The D2600 has a SAS expander embedded in it. SATA disks won't run well in that setup. Speeds may also have downshifted to 3Gbps.
  • Make sure your P812 cache bias is set to 75% write or greater.
  • If this is a standalone ESXi host with no SAN, ESXi should NOT be running on USB or SDHC.
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Yes ESXi build 2494585, HPSA 5.5.0.106-1OEM, Where do I find the Driver version number? Noted regarding the P410, I will move them tonight. I am using dual-domain at the moment, but the symptoms and performance behavior was exactly the same as with single domain connection. P812 Cache bias is set to 75% write, made no difference. This host is a standalone host but is part of a Vcenter system, with logs being written to the Vcenter Server. Thanks for your help. – Alexander Fitu Aug 22 '16 at 12:38
  • @AlexanderFitu Your VMware is running an exceptionally old version of the software. Build 2494585 is from 2015-03-12. This means you've never applied any bug fixes or updates to your system. **The current build of ESXi 6 is 4192238.** - I'd strongly suggest an ESXi update. [Here's a summary of changes](https://esxi-patches.v-front.de/ESXi-6.0.0.html), some of which probably address your issue. – ewwhite Aug 22 '16 at 12:45
  • Thanks, If I update via the standard channel, will the HP customizations be lost? – Alexander Fitu Aug 22 '16 at 12:52
  • @AlexanderFitu the easiest path for you if you're not in a cluster with a shared-storage solution is to download [the current ESXi 6 HP custom CD](https://my.vmware.com/web/vmware/details?downloadGroup=OEM-ESXI60U2-HPE&productId=491) and just run a manual update. That will bring you to build #3620759 - better than what you have now. – ewwhite Aug 22 '16 at 12:57