4

I'm trying to find out how to measure the total bytes written (or a percentage of maximum expected, either is fine) for a few RAID arrays behind LSI controllers. The controllers are all LSI MegaRAID SAS 9271-8i controllers. I've tried using MegaRAID Storage Manager and MegaCLI, but neither seems to show the information that I need. I've found a couple solutions online, but they only seem to be for Linux, where you can patch the kernel or use smartctl in unconventional ways. That won't work for me on Windows.

I'd really like to avoid pulling the drives out, putting them in another machine, testing with SMART, and then putting them back. Would be a real pain in the neck. If it's important, each controller has two virtual drive groups of 4 disks each, in RAID10, with SAS SSDs forming the groups.

jski
  • 911
  • 1
  • 7
  • 20

3 Answers3

3

I wouldn't bother with watching the SSD wearout behind a hardware RAID controller. You're using RAID for a reason, so let the controller handle it.

It's a bonus that you're running with enterprise SAS drives. If the SSDs are well-mated to the workload (write-heavy/read-optimized/etc.), there shouldn't be a need to delve deeper.

In this case, your LSI 9271 controller has their SSD Guardâ„¢ technology (triggered by the S.M.A.R.T. figures you seek) which can leverage a hot-spare SSD if you're concerned about rapid wearout or some premature failure condition.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
2

I am using megacli and smartctl in Ubuntu Linux.

First get the Device Id of one of the SSD drives:

megacli -pdlist -aALL -NoLog | egrep '(Raw Size|Inquiry Data|Device Id)'

For example Device Id 5. Then you execute:

smartctl -x -d megaraid,5  /dev/sda

This shows an extensive S.M.A.R.T report for the SSD drive attached to the Broadcom / Avago / LSI MegaRAID controller.

0

On CentOS I certainly monitor SSDs with smartctl, for read mostly random archives I run Dell gen 12, 13 and 14 stack with non-Dell Samsung EVO 840, 850 and 860. Don't choose Samsung PRO, although more expensive they flap randomly for quite a few people as reported on Dell forums and ruin the whole volume. EVO lasted us 3 years and even RAID 5 still survived. ~3 suddenly died out of growing ~66 disk batch.

On CentOS every x hours I run in cycle from 0 to 23 for a Dell R720/730/740xd LSI-based PERC RAID via a Python script and SSH and compare deviation of important values with command like below + a custom parser of this output and a DB to store date + value to track deviations:

smartctl -a -d sat+megaraid,0 /dev/sda

I find it important to watch if I'm nearing the Samsungs warrantied terabytes written via "241 Total_LBAs_Written", as if the users are abusing the write limit they may all start suddenly dying and RAID won't help, as well as reallocations which may prompt that you'll soon need a spare.

kuz8
  • 423
  • 1
  • 6
  • 9