1

THE PROBLEM

We've all been there. The server is toast. It's not booting and something seems really odd.

Sure, there are recovery boot options with most operating systems and a collection of helpful ISOs out there to assist. Good examples might be the System Rescue CD, a Knoppix liveCD, or even the built-in rescue boot of CentOS / Red Hat Enterprise Linux. But what if the problem seems to be within the hardware or related to it?

There are vendor tool DVDs available, such as HP/HPE's ServicePack for Proliant (SPP) or Dell's System Update. These contain useful hardware diagnostics and tools that would be perfect for a recovery environment. Except, they're not available in any recovery environment.

I suspect that actually distributing a recovery disk with 3rd party closed-source tools on would run into some copyright issues. However, presuming we can get hold of the tools/drivers needed it seems like a useful thing to build ourselves and keep around.

THE REAL QUESTION

Has anyone ever built something like this? How did you do it? What issues did you run into?

Steve Bonds
  • 874
  • 2
  • 10
  • 19

1 Answers1

1

If you're referring to modern HPE ProLiant hardware, there's really no need for an external recovery disk/utility.

HPE servers since the Generation 8 series (2012) have the Intelligent Provisioning feature built in. It provides and preinstallation, configuration, firmware management and diagnostics environment integral to the hardware. But even this is rarely needed...

If there's a hardware problem on an HP server, the ILO has the Integrated Management Log and ILO Event Log available.

enter image description here

In addition, the Systems Insight Display gives you a visual status of the hardware components.

enter image description here

What type of issue are you envisioning that isn't covered by the above diagnostic approaches?

Edit:

The Linux firmware update for your disks is located here. It is version HPG3. This disk, MM1000EBKAF, has not been included in HP SPP distributions, so you will have to update them manually.

You can wget the .scexe file to a running Linux OS or Live CD and run the firmware update that way.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • The *specific* problem I have right now is that the SPP isn't auto-updating drive firmware on the hard disks connected to a P410i on a DL380 G6. I also have not been able to find a way using the ssacli command to pull raw SMART data from the drives but the Linux "smartctl" utility can. However, in the spirit of UNIX, I was thinking of how to make a general purpose tool and leave it up to Future Me or others on how to use it in yet-to-be-determined specific cases. :) – Steve Bonds Jan 24 '18 at 16:45
  • There's no need to look at SMART data on disks connected to HP controllers. The built-in monitoring is excellent, and SMART is just one in a number of heuristics that HP uses to determine drive health. What type of drives are you using now and what firmware do they have? – ewwhite Jan 24 '18 at 22:32
  • These are MM1000EBKAF drives with HPG0 firmware and I'm very curious about the power-on hours for these drives purchased as new. In other circumstances it has also been handy to be able to reset the ILO password or networking config from the OS... even if the OS is broken. – Steve Bonds Jan 25 '18 at 06:27
  • @SteveBonds See my edit above. – ewwhite Jan 25 '18 at 13:18
  • Also, why do power-on hours matter? – ewwhite Jan 25 '18 at 13:19
  • Power-on hours are a great way to help explain to your vendor that the "new" drives they sent are 5+ years old. Thanks for the hint on the .scexe file. I've been beating my head against how to get those drives updated using the SPP ISO and so far not having any luck. Knowing that the firmware isn't even there explains a lot about why it won't apply. :-) – Steve Bonds Jan 26 '18 at 22:32
  • For Future Me or anyone else who might want to insert new firmware into an SPP disk, here's a great doc on how to do it under Linux. (archive.org link to help fight future HP site brain damage.) https://web.archive.org/web/20180126230022/https://support.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0130074 – Steve Bonds Jan 26 '18 at 23:01