10

I have a pair of Broadcom NetXtreme 57711 10GbE cards. I put one in a Dell R710; it boots with the card fine, the OS (CentOS 7) recognizes it, and all seems well. However, when I put the other card in an R730xd (also running CentOS), something unexpected happens: the R730xd's fans kick into high speed as soon as the system starts to boot the OS, and run continuously at high speed no matter what is happening. The fans do not run at full speed when interacting with the Lifecycle Controller or the BIOS screens. They only start spinning at full speed when the computer starts to boot the OS and before the OS comes up, so it doesn't seem to be a function of the OS.

I've updated the R730xd's firmware to the latest versions available, I've tried setting the CPU performance profiles in the BIOS, and I've tried setting the thermal profile in the iDRAC, but nothing seems to change the behavior; the system always goes into full-on jet-engine mode. Googling reveals at least one other person encountering similar fan behavior related to adding a PCI card to an R730xd (though it's unclear whether it's the same card – it doesn't appear to be).

What am I doing wrong? More importantly, can this behavior be changed, so that the fans do not stay stuck at full speed?

mhucka
  • 669
  • 4
  • 10
  • 21
  • 1
    I'd call dell and open a support ticket. The 13th Gen has been ... terrible especially on the software side. Also I'd make sure your iDrac is at 2.15.10.10 and BIOS is at at least 1.3.6 – Zypher Aug 19 '15 at 21:18
  • Verify the chassis lid is properly secured. If the sensor thinks the chassis is open, the fans will run at full speed. – Bad Dos Aug 19 '15 at 21:47
  • @BadDos Good suggestion, but if I take the card out, the behavior doesn't happen. I put the card back in, and it happens. I've repeated this, so I'm pretty sure it's not the lid. (But I wish it were that easy...) – mhucka Aug 19 '15 at 22:18
  • @Zypher I just checked the versions: idrac is 2.15.10.10, bios is 1.3.6, but (and this may be a clue) the Broadcomm/QLogic card does not show up in the firmware update list. Neither of the two BCM57711 cards that I have show up in the R730xd, but both of them show up in the R710. So it seems there is something about the R730xd not recognizing the BCM57711. I guess it's not supported? – mhucka Aug 20 '15 at 02:25
  • @Zypher You responded too quickly :-). I deleted the previous comment and updated with the correct machine's numbers. Sorry for causing confusion! – mhucka Aug 20 '15 at 02:26
  • Ha. I get insta notified in the stack exchange app. OK so to be clear those cards shipped with the R710 and you moved them to the 730xd? – Zypher Aug 20 '15 at 02:28
  • @Zypher Correct. These are refurbished machines and one of them came with some unexpected hardware. Turns out I have a definite need for fast networking between this R730xd and a R710, so this would have been a wonderful windfall ... if it worked. – mhucka Aug 20 '15 at 02:32

3 Answers3

14

After tearing my hair out on newly arrived lovely R730xd with 16 3.5" disk slots spinning fans at 15k RPM when Intel X520-DA2 10G card is in any PCI slot, I've found following solution for CentOS 6.7 to quiesce fans in jet mode, although it is brute force, not taking into account 10G card's temperature probe - may result it card burnout from overheating, but I believe it is unlikely. Probably there's a way to monitor X520's thermal metrics.

** Description: The default automatic cooling response on PowerEdge 13G server for third-party PCIe cards provisions airflow based on common industry card requirements. Our thermal algorithm targets delivery of maximum 55C inlet air to the PCIe card region based on that industry standard.

For some cards may not need additional cooling above the baseline (such as ones that have their own fan), Dell has enabled an OEM IPMI based command to disable this default fan response to the new PCIe card.

To remediate:

1. Install IPMI tools:

yum install OpenIPMI OpenIPMI-tools
chkconfig ipmi on  # << optional for the task
service ipmi start  # << optional for the task

2. Query Dell's Third-Party PCIe card based default system fan response:

ipmitool raw 0x30 0xce 0x01 0x16 0x05 0x00 0x00 0x00

# response like below means Disabled
16 05 00 00 00 05 00 01 00 00

# response like below means Enabled
16 05 00 00 00 05 00 00 00 00

3. Jets off or Set Third-Party PCIe Card Default Cooling Response Logic To Disabled:

ipmitool raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x01 0x00 0x00 

4. Jets on or Set Third-Party PCIe Card Default Cooling Response Logic To Enabled:

ipmitool raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x00 0x00 0x00 

References: Windows utility (link) Spiceworks post for Windows and 3rd party GPU card causing Gen13 Dell to spin fans (link)

Other findings: Dell's X520-2 firmware pack (here) doesn't recognize Amazon-sourced new in-box $188 vs Dell branded $586

kuz8
  • 423
  • 1
  • 6
  • 9
  • 1
    Thank you so much, I almost got killed by my DELL T630 crazy noise, and these commands just saved my life! – Windoze Jul 19 '17 at 06:12
  • 1
    Wow. I have this exact configuration with this exact problem. I can't believe it was the X520 card causing the issue. Thank you so much. – zymhan Aug 25 '17 at 21:07
  • 1
    @kez8 This is great, though I can only get it to go from around 0–15%, and then after that it's at 100%. Is it different for you? – Louis Waweru Apr 30 '22 at 06:42
  • 1
    @LouisWaweru, thank you, yes, it's all 100% or basic airflow for me too. I've set it to "off" for my racks, so far in about 6 years only one X520 has died out of a couple of dozens. – kuz8 May 01 '22 at 14:50
4

So after the chat in the comments I have some probably bad news.

Dell hardware that ships with a server as a configured item - which IIRC those broadcoms where - is almost never comparable between generations. Dell tends to put custom firmware that hooks into all their management systems on these things.

So the short of it is the part is probably not comparable, won't be supported, and will cause weird issues like what you are seeing.

Note: this doesn't apply to parts sold through their accessories catalog, on parts shipped as part of a dell server build.

Zypher
  • 36,995
  • 5
  • 52
  • 95
  • I really hoped there would be a different resolution to this. It does not seem like BCM57711's are *that* old. I know it's up to Dell to decide at what point they consider a given product obsolete/incompatible, but this is a pretty short cycle IMHO, when compared with other things. I guess the only solution is to get a newer model of the BCM57711 compatible with the R730xd, but they are expensive (even used ones), and my research grant can't afford it. Bummer. In any case, thank you for your time in resolving the cause of this problem. – mhucka Aug 24 '15 at 21:16
  • @mhucka FWIW you don't need a dell branded 10g card just find one that you can afford and put it in there. Also I agree these incompatibilities are stupid. – Zypher Aug 24 '15 at 21:33
  • @Zephyr That would be great, but ... is there a way to know whether a given card is compatible? Or do you mean one of the ones listed as options for the R730xd (http://www.dell.com/us/business/p/poweredge-r730xd/pd)? – mhucka Aug 24 '15 at 21:45
  • 2
    Any PCI-E card should work. The difference with Dell cards is that they have firmware that interacts with their management software - which is what causes the inter-generational incompatibilities. We've used Intel brand 10G cards with great success on multiple generations. – Zypher Aug 24 '15 at 22:11
  • @Zypher So, you're basically saying that Dell cards are _inferior_ to 3rd party ones? ;-) – ivan_pozdeev Nov 02 '15 at 02:59
0

for windows and servers users there is a solution same as this one for linux first download this tool for ps manipulation https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=9ngfj then get the same HEXA for turn off the default cooling for passive cards

https://www.dell.com/support/kbdoc/en-us/000135682/how-to-disable-the-third-party-pcie-card-default-cooling-response-on-poweredge-13g-servers then get