HP Proliant G6 reporting failed SSD drive - alternative strategies for monitoring?

Question

I would like to get your feedback about a controversial situation I found myself in some days ago. I was tasked to use a HP Proliant G6 for development purposes with 2 new (< 2 months, never used before) non-HP SSDs used in RAID1 configuration. They are being used for intensive development-oriented tasks (about 500GB written per day); there are also regular HDDs in RAID5, but we will discuss the RAID1 array here.

Samsung SSD 840 PRO Series
PLEXTOR PX-256M5Pro

smartctl output for both is available here: https://gist.github.com/anonymous/cf8a5208a7315440f796

Relevant past issues

The Plextor drive has always been affected by a reported overheated condition, that I assume is due to the fact it's not an original part

I've seen once the RAID1 being rebuilt after an occasional server reboot, and couldn't explain the reason for that.

Failure event

A few days ago the Plextor disk was reported in a plain "Failed" status: physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 256.0 GB, Failed)

False positive?

Thus I unplugged it, checked the SMART output and ran a full test (see smartctl outputs above). The test passed, and even worse seating back the drive gives a perfectly functional RAID1 array.

This is awkward.

Alternative monitoring?

I do not know how to let the P410i tell me what is the specific reason for the "Failed" status (I think it's not possible), and I know these are non-original HP parts (thus invalidating my paid HP support), but for this non mission-critical server I'd like to see if it's still possible to keep using non-HP disks and still have some kind of monitoring of their health status.

What is your opinion? I have 3 questions:

should HP controller monitoring status be trusted only when used with original parts? (this is easy)
are these (totally non high quality) SSDs objectively in good health?
shall I give 100% trust to the results of the SMART tests?

Thanks in advance

score 5 · Accepted Answer · edited Apr 13 '17 at 12:14

5

Your SSDs are likely healthy, but the HP Smart Array P410 RAID controller is not compatible with every SSD.

In particular, some SSDs report incorrect temperatures attributes to the controller that cause chassis fan and system thermal issues. In addition, any SATA device used on that controller will be downclocked to 3Gbps speeds from 6Gbps. So you're losing bandwidth potential.

Obviously, this combination of components is an ill-match. You can either use a known-good SSD intended for use with the controller (HP-branded Sandisk/Pliant/Intel/Samsung/STEC) or experiment with 3rd party drives until you find one that works. I recommend Intel and OWC.

I've covered some drive options that work well here:

Are SSD drives as reliable as mechanical drives (2013)?

and here:

HP storage arrays - multiple channels?

edited Apr 13 '17 at 12:14

Community

1

answered Dec 10 '14 at 11:17

ewwhite

194,921
91
434
799

thanks @ewwhite, it's a nice reading - however here I am asking (see those 3 questions) specifically if I should trust 100% the SMART health report instead of the controller's health report. Also I will investigate on the downclocking you mentioned, thanks – Deim0s Dec 10 '14 at 11:36
in short, as stated in premise, this is not a mission-critical server but more of a playground and I am wondering if I can still have reliable monitoring by looking at the SMART results instead of the controller itself. For best practices I'd definitively use top-tier hardware – Deim0s Dec 10 '14 at 11:38
1

@Deim0s You don't need *top tier* hardware... You just need components that work together. What you have right now does not work, nor will it ever. The danger in using gear that does not work with the management agents means that you will receive false alarms, mask *real* hardware issues and have to ask questions like this :) As for S.M.A.R.T., it is only one method used to determine drive health. It is not the "last word" in monitoring, nor will it detect every failure/prefailure condition. – ewwhite Dec 10 '14 at 11:44
See my edit above for specific drive recommendations. – ewwhite Dec 10 '14 at 11:50

HP Proliant G6 reporting failed SSD drive - alternative strategies for monitoring?

Relevant past issues

Failure event

False positive?

Alternative monitoring?

1 Answers1

Linked