2

We are running an SSD array in a SAN, and the performance is great. But we need to monitor the write endurance SMART attributes so that we can determine when the drives are close to wearing out. We tested these drives and confirmed that we could get that data, but didn't try it in the SAN. Now we have discovered that our SAN (a Dell PowerVault) doesn't have any way to query SMART data. All it does is query a few attributes itself and generate a report. That report doesn't contain the attributes we need.

Is there any way to access our drives' SMART data without taking them out of the array and putting them in another machine to read the data?

Josh Yeager
  • 275
  • 3
  • 13

4 Answers4

4

Generally speaking, this is managed by the firmware of your storage. It's also not your problem, because if an SSD fails, it'll get replaced by the vendor regardless of how you use it.

Basil
  • 8,811
  • 3
  • 37
  • 73
  • But I would much rather replace an SSD in a planned effort on a slow day than whenever it decides to die. So these stats would be good to have so we can see the end of life coming. – Josh Yeager Mar 27 '14 at 20:22
  • Your vendor, if they're doing their job, is watching for that same reason :) NetApp (for example) will ship drives to replace drives in a pre-fail state. – Bill Weiss Mar 28 '14 at 00:25
  • You'll have plenty of warning so you can start watching for a window, assuming you're with a real vendor. – Basil Mar 28 '14 at 14:32
  • And sometimes disks just die. S.M.A.R.T. is *not* the last word in disk/storage monitoring. – ewwhite Mar 28 '14 at 21:54
3

You're out of luck for querying them directly. Your SAN device will need to serve up that data in some way (SNMP or some proprietary monitoring interface).

Bill Weiss
  • 10,782
  • 3
  • 37
  • 65
  • Page 238 of the owner's manual (ftp://ftp.dell.com/Manuals/Common/powervault-md3600f_Owner's%20Manual_en-us.pdf) says "The RAID controller monitors all attached drives and notifies users when a predicted failure is reported by a physical disk. " – Bill Weiss Mar 28 '14 at 00:26
3

I agree with the sentiment about letting the storage array handle this. There are so many misconceptions about how to maintain and manage SSD storage...

Treat them like disks in this case.

  • RAID them.
  • Have spares.
  • Keep your support contract active.
  • Profit.

There's no need to preemptively replace your SSDs. If you have a write load that is truly write heavy, then you should also be using SSDs that are optimized for that workload. They exist!

An example of what an enterprise SAS SSD can report through a RAID controller or SAN:

  physicaldrive 1I:1:4
     Port: 1I
     Box: 1
     Bay: 4
     Status: OK
     Drive Type: Data Drive
     Interface Type: Solid State SAS
     Size: 400 GB
     Firmware Revision: HPD9
     Serial Number: 00197356
     Model: HP      MO0400FBRWC     
     Current Temperature (C): 29
     Maximum Temperature (C): 43
     Usage remaining: 99.51%
     Power On Hours: 11672
     Estimated Life Remaining based on workload to date: 98765 days
     SSD Smart Trip Wearout: False
     PHY Count: 2
     PHY Transfer Rate: 6.0Gbps, Unknown
ewwhite
  • 194,921
  • 91
  • 434
  • 799
1

The MD3620f does not supply this information via Modular Disk Storage Manager, SMcli, nor does it even include SMART data in any of the support bundle files.

Your only hope of getting SMART data without removing drives would be via SNMP polling, as this tends to grab any data that can possibly be pulled from every component in the enclosure. If SNMP does not give you the data, then the controller firmware likely does not even pull SMART data from SATA disks at all.

Source: firsthand experience as a Senior Engineer supporting MD3-series arrays for Dell

Edit: ewwhite also makes an excellent point - there isn't a real need to proactively monitor the SSDs in this array unless you continue to use it past its end-of-life or without warranty coverage (in which case this data would at least be "handy"). If you're using the SSDs for caching, then there's no need to worry. An SSD failure may have a slight impact on performance, but after warranty replacement things are good-as-new. If you're using the SSDs as part of a disk group (not using the newer SSD caching or disk pool features), then consider using RAID6 so you don't have any high-level risk to be concerned with.

JimNim
  • 2,736
  • 12
  • 23