1

I've got two HP DL380 G7 servers + P812 controller + D2700 enclosure. They're database servers with 144 Gb RAM. P812 firmware is 6.40, D2700 is at 0147

They both worked great with 18 OWC Mercury Extreme SSDs (SATA). After I added 6 more SSDs in both D2700 enclosures to make 24 SSDs in each enclosure, one of the servers is exhibiting very poor disk performance compared to how it was before the upgrade and compared to the other server.

So I suspect that one of the 6 SSDs that was added to the server with poor performance is faulty. But which one? HP Arrays Configuration Utility doesn't show any issues and no issues appear at POST. Even the long ACU report doesn't show anything.

So I'd like to see the S.M.A.R.T. attributes for these drives to see if I can pick out the one failing. Is there a Windows tool that will allow me to view S.M.A.R.T. attributes in this configuration?

In a very similar question 3rd party SSD drives in HP Proliant server - monitoring drive health it is suggested to use smartctl from smartmontools. Unfortunately, I'm not having any luck seeing the SSDs behind the P812+D2700 - how can I make smartctl work?

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdc,0 -T permissive -s on
smartctl 6.3 2014-06-23 r3922 [x86_64-w64-mingw32-2012r2] (cf-20140623)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              LOGICAL VOLUME
Revision:             6.40
User Capacity:        5,760,841,244,672 bytes [5.76 TB]
Logical block size:   512 bytes
Rotation Rate:        15000 rpm
Logical Unit id:      0x600508b1001cf0ebb14e9131d7XXXXXX
Serial number:        PAGXQ0ARXXXXXX
Device type:          disk
Local Time is:        Fri Dec 12 18:42:32 2014 EST
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
unable to fetch IEC (SMART) mode page [Input/output error]

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported

Device does not support Self Test logging

Here is the output for the command suggested by the very similar question (I changed /dev/sda to /dev/sdc because that's the device of the first volume on the P812:

C:\Program Files\smartmontools\bin>smartctl -a -l ssd /dev/sdc -d sat+cciss,1
smartctl 6.3 2014-06-23 r3922 [x86_64-w64-mingw32-2012r2] (cf-20140623)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sdc: Type 'sat+...': Unknown device type 'cciss,1'
=======> VALID ARGUMENTS ARE: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, areca,N[/E], auto, test <=======

Use smartctl -h to get a usage summary

sevzas
  • 213
  • 3
  • 11
  • Did you try variations on `smartctl -a -l ssd /dev/sda -d sat+cciss,1` from the linked question? There's other options other than cross depending on controller. From your output you're hitting the LV instead of any drive behind it. – MikeyB Dec 13 '14 at 00:22
  • @MikeyB added output of command from the linked question. I'm concerned that the D2700 is not S.M.A.R.T.-aware and will not pass SMART commands. – sevzas Dec 13 '14 at 01:03
  • The D2700 is S.M.A.R.T. aware, in addition to being able to report on [SCSI Enclosure Services](http://en.wikipedia.org/wiki/SCSI_Enclosure_Services) (SES) details, but understand that the use case here is narrow. I suggested that SATA timeouts could [cause issues with performance](http://serverfault.com/questions/331499/how-can-a-single-disk-in-a-hardware-sata-raid-10-array-bring-the-entire-array-to/331504#331504) on a shared expander backplane, like that in the D2700. However, that's more likely to be a spinning media issue; not a problem with an SSD. – ewwhite Dec 13 '14 at 01:35
  • *Please provide numbers detailing your expected and actual performance figures.* – ewwhite Dec 13 '14 at 16:27
  • @MikeyB - based on SmartmonTools wiki at http://www.smartmontools.org/wiki/Supported_RAID-Controllers it appears that there is no windows support for cciss. Also, the smartctl help on windows does not show cciss as an option. It says "-d TYPE, --device=TYPE : Specify device type to one of: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, areca,N[/E], auto, test – sevzas Dec 16 '14 at 20:18

2 Answers2

1

Please provide numbers detailing your expected and actual performance figures.

Also, what is the SAS topology? How many SFF-8088 cables are in place between the host and the D2700 JBOD?

As I mentioned earlier, the HP StorageWorks D2700 is S.M.A.R.T. aware and reports on SCSI Enclosure Services (SES) details... But your use case here is narrow. That's a lot of SATA drives on an expander. We know that SATA timeouts can cause issues with performance on a shared expander backplane, like the one in the D2700. However, that's more likely to be a spinning media issue; not a problem with an SSD.

In my experience, SSDs either work or they don't. There isn't much in-between (unless you've hit a write endurance limit). So the things I'd look at are:

  • You expanded the array because you were out of space. Exactly how out of space were you prior to expansion? I'd hope that you hadn't exhausted space. Think about SSDs and the lack of TRIM support on that controller.

  • I would have recommended under-provisioning these drives or limiting the Logical Drive size to account for the missing TRIM functionality.

  • Update your firmware. You're on an unsupported release of the D2700 enclosure firmware (it was recalled), and your RAID controller is also behind. As of this writing, 0149 is the right D2700 firmware, and your controller should be on version 6.60. Upgrade the hosts as well.

  • It may be time to step up your game. 24 x consumer SATA SSDs on oversubscribed buses (RAID controller and JBOD backplane), where the 6Gbps SATA drives are downshifted to 3Gbps, means that you've also reached the upper-bounds of the hardware. The Smart Array P812 controller has diminishing returns on SSD IOPS at ~6 disks.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Keep in mind that the same configuration I'm trying is working perfectly on an identically-configured machine/D2700 sitting adjacent to the first machine. The old configuration worked great on both machines. – sevzas Dec 13 '14 at 15:19
  • Prior to expanding, I had about 10% free. There are two SFF-8088 cables connecting each machine to each enclosure. – sevzas Dec 13 '14 at 15:21
  • The configuration I'm attempting is "known to be good", so I feel that upgrading drivers to try to fix the problem may cause additional unexpected side-effects/issues that could derail the troubleshooting. – sevzas Dec 13 '14 at 15:35
  • 1
    @sevzas On the contrary, a firmware upgrade is almost certainly a good first step for hardware issues. It's certainly what the majority of vendor support channels will ask you to do first. – Dan Dec 13 '14 at 16:14
  • @sevzas You're so far beyond *acceptable use* on this platform, that you really don't have any other recourse but to upgrade firmware. If you recall, this fixed your issues early on. Either way, this configuration is a edge-case, and at a certain point, you should have refactored. This began with 4 SSDs a year ago. Getting to 90% utilization on an array comprised of *18* SSDs without accounting for TRIM by under-provisioning the logical drives could result in this type of behavior. I recommended OWC disks in *small* deployments; definitely not at this scale. – ewwhite Dec 13 '14 at 16:20
  • What's the proper amount to under-provision by? – sevzas Dec 13 '14 at 23:28
  • @ewwhite - after further testing we determined that both machines had the performance problem after the upgrade (not just one). I upgraded the Firmware and that did not help. I even rolled one of the arrays back to 18 drives by rebuilding it from scratch and the performance issue remains. The problem manifests itself with 100% Active Time and Disk Queue Size of >10, response times for writes >100ms with I/O loads of just 50 MB/sec. – sevzas Dec 16 '14 at 15:39
  • What are your controller cache settings? Cache ratio? – ewwhite Dec 16 '14 at 15:40
  • @ewwhite: cache enabled, 80% write / 20% read. I've been playing with HP Smart Storage Administrator CLI. Using the SSA CLI I noticed that the recent OWC drives shipped with a new firmware version (600ABBF0). One of those drives is reporting "PHY Transfer Rate: 1.5Gbps". I will be replacing it soon. Perhaps it's possible to read the drives' S.M.A.R.T. parameters using SSA CLI ? – sevzas Dec 16 '14 at 20:27
  • Please try with cache disabled. *IF BBWC or FWBC is installed then for the current generation of SSD's and Smar Array, it is recommended that you do NOT enable the array accelerator for the SSD's logical volume.* - See if that changes results. – ewwhite Dec 16 '14 at 20:33
  • @ewwhite - cache enabled didn't change things. The following improved things drastically: make sure that each drive pair is on the same firmware version. With this change, the 24-drive array is behaving "acceptably" but measurably worse than the 18-drive array on side-by-side tests. I think at this point it's time to look for alternative expansion strategies. – sevzas Dec 18 '14 at 19:34
0

Here is the answer to the original question, asking for a Windows tool that will allow me view S.M.A.R.T. parameters on SSDs that sit behind an HP SmartArray P812 on a D2700 chassis:

I've edited the answer as of Aug 29, 2017. Originally I concluded that there was not a windows-based tool that allows me to query the S.M.A.R.T. parameters on a SATA drive in a D2700 enclosure using a P812 controller, I see this is not completely accurate. While the HP Arrays Configuration Utility (ACU) does not allow me to query the S.M.A.R.T. parameters, it does notify me when a drive is predicted to fail soon and this notification also appears in the Array Diagnostics Report.

As of the time of the original answer, I considered these three candidates but none of them did the job at the time. The comments below might not be accurate any longer:

  • SmartmonTools/smartctl - looks like querying S.M.A.R.T. behind an HP controller is supported on Linux according to 3rd party SSD drives in HP Proliant server - monitoring drive health, but the windows version of smartctl does not appear to support cciss driver which is needed for HP SmartArray controllers according this
  • HP SSA CLI - has extensive support for HP controllers, but no support for S.M.A.R.T. - HP seems to favor their own SmartSSD Wear Gauge technology. The command I used is: "controller slot=1 ssdphysicaldrive all show detail" another useful command is "controller slot=1 show ssdinfo"
  • HD Sentinel - advertises support for HP Controllers here, but when you read the fine print here it says it can't peer behind HP SmartArray controllers
sevzas
  • 213
  • 3
  • 11
  • @Downvoter - please share your rationale for downvoting my answer. Remember that my original question was about finding a diagnosis tool but unfortunately it morphed into a diagnosis of issues with my hardware. Rest assured, I will post the resolution to the hardware issue once I've confirmed it works. – sevzas Dec 18 '14 at 14:19