6

I have a server with the error: 'iLO Self-Test reports a problem with: Embedded Flash/SD-CARD' ' Embedded media manager failed initialization '

HPE advises the following: https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097

I've now reached the step of all steps, replace the system board. But I'm wondering what the impact would be if I would not replace the board. I do not use the SD-CARD. I'm not sure what the NAND actually is used for and what it stores, and further more what do I risk if I do not resolve this issue?

I believe it might be loss of logs from the iLO and perhaps even loss of settings from the iLO?

Edit, it is now months later, servers still out of warranty, even found more similar issues with NAND which all fit under this HPE Advisory umbrella. Using the RESTful Interface Tool solves annoying problems with hands on support for the AC power removal part. But sadly I still can't fix all issues. While I've seen some information, in this thread, online and the like. I can't find any conclusive information from HPE what the impact is for NAND issues like this.

True
  • 61
  • 1
  • 1
  • 4
  • Luca, did you manage to solve the problem by replacing the nand chip ? I have a similar problem on my ml350e gen8 v2 server with this error message : Controller firmware revision 2.10.00 Embedded media initialization failed due to media write-verify test failure – cz.steve Nov 03 '21 at 14:40

5 Answers5

4

There isn't too much you can do about this.

Please run the normal firmware updates on the host either via Intelligent Provisioning or the HP SPP bootable DVD.

But since Gen8 systems are end-of-life, there isn't much incentive for HPE to deal with this issue. I've seen this in about 5% of Gen8 servers still in the field.

This should not impact the usability of the server, though.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • I see much higher numbers than 5% on Gen8 servers sadly. My sample size isn't really high though, I would say 30 - 40 servers with above 25% NAND related errors which can't be solved by the HPE advisory: https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00048622en_us . What sample size do you have? – True Aug 08 '19 at 14:42
4

There is actually a procedure described by HP to correct the problem with the embedded NAND Flash. We had the same problem and the procedure described in the advisory has corrected the error. After the NAND flash format our server needed a power off and a disconnect from the main power supply. After the following boot the iLO Health was o.k.

Advisory: (Revision) HPE Integrated Lights-Out 4 (iLO 4) - How to Format the NAND Used to Store AHS logs, OneView Profiles, and Intelligent Provisioning

net-worker
  • 41
  • 3
  • This document has been the bane of my existence for the last couple weeks. Unfortunately even with the [RESTful Interface Tool](https://hewlettpackard.github.io/python-redfish-utility/) sending the AuxPwrCycle doesn't fix our NAND problems. Different NAND errors which all eventually fit under this Advisory umbrella. – True Aug 08 '19 at 14:47
3

The NAND is used for:

Dan
  • 211
  • 1
  • 3
1

From the HP doc "NAND Usage in Gen8 Through Gen10 Plus HPE Servers":

A NAND with too many worn out cells is considered worn out. A worn out NAND in an HPE server can lead to the following:

  • Inability to manage the server with HPE OneView.
  • Inability to download the AHS log.
  • Sluggish iLO GUI interaction.
  • POST errors during system boot.
  • Inability to use Intelligent Provisioning to configure the server or deploy an operating system.
  • Timeout of remote commands to iLO (impacting automation developed by customers).

A worn out NAND will not normally impact a server in production, except for servers managed by HPE OneView. If the NAND is worn and the server is already under HPE OneView management there will not be an immediate impact. There may be errors and the profile cannot be modified. In addition, the server cannot be brought under HPE OneView management and a server profile cannot be applied.

Source: "NAND Usage in Gen8 Through Gen10 Plus HPE Servers" https://psnow.ext.hpe.com/doc/a00060052en_us

Morten
  • 11
  • 1
0

We have a couple of DL360 G8 servers with this NAND issue. The major problem is that they boot from SD (Firewall appliances) and when this issue appears the whole SD becomes read only which creates problems as the states and configs are not saved, nor can you update anything. Reformatting the NAND flash and resetting the E-Fuse does not resolve the problem. I was told the NOR NAND flash is the N25Q064 (64Mbit) 8 pin SO chip labeled U163 on the motherboard (nope, that's the system BIOS, check the update below!) near the riser and slot 2 of the PCI-E x16.

Update: apparently I was mislead and the internal SD NAND Flash chip is not that 8pin SOIC chip (BIOS). The 32Gbit NAND Flash is a BGA type surface mount chip (NW234 on the BL460 G8) and requires more effort and tools to be replaced (-> reballing) and is located on the underside of the server motherboard directly under the SD card slot (https://imgur.com/kxFqUJR) labeled U192. The chip can be potentially sourced from the BL460C G8 counter part 684370-001, removed from the small daughterboard and resoldered on the motherboard but one must have the proper BGA grid/solder balls and hot gun and shield nearby SMD components from the heat - not for the faint of heart. This is the H26M31003GMR a 32Gbit NAND Flash 3.3V chip. The NW234 on the BL460 daughterboard corresponds to the MT29F32G08AECBBH1 and can be found on ebay for about 30USD. The cost for the single H26M31003GMR NAND flash chip on Aliexpress is around 3USD so it may still be viable to source it new (or just locate a 684370-001 and unsolder it from there). It is also possible that the DL360 motherboard has a slightly different NAND Flash chip respect to the BL460 G8 but I doubt it - in any case the original part on the DL360 is the H26M31003GMR as indicated by my photo here. Additional pics of Flash removal: https://imgur.com/a/Wj5r8Dk