3

I have a DL580 G7 with four E7 4870 and 128gb of RAM installed (eight cartridges with 2x 8gb each). The operation system is Ubuntu 18.04. There is a TITAN X on the pcie16 and the obligatory p410i installed, but no other periphery. When I benchmark this system, I get about 50% of the performance that it should give. For example this is a reference benchmark of a DL580G7 with a slightly weaker CPU (E7 4850) and a otherwise similar setup.

However my system is only able to produce half of the performance in the same benchmark (I get about 980 for the CPU and 20,000 multicore performance). This does not seem to be right.

The benchmark is showing all 80 cores and 128gb of RAM, so the hardware is recognized correctly.

I have already gone through the low latency tuning checklist of HP and changed the BIOS accordingly. The power settings in ILO3 are all on max performance.

Ubuntu is set to the "performance" governor on all 80 cores.
I noticed that even when I put the system under high stress (like crunching numbers on all 80 cores with 100% CPU use for hours) the heat of the CPU's barely changes (they remain at 40 degree) and the fans don't spin up at all (they stay at 40%). The total power consumption displayed in ILO3 goes up to 650 Watts, but I would expect that to be more close to 1KW under stress conditions. I am a bit puzzled by this.

I already tried out different BIOS versions. The original BIOS was a 07/01/2013, which has caused performance issues for other users as well (such reports are found on the internet). So I downgraded it to 12/03/2012 and the problem remains.

Also when I compare the performance of this machine with my previous machine (having an i5 4460) I noticed a drop in single core performance by the factor of four on my applications (on not IO-intensive things like adding large numbers of vectors), which is consistent with the results on the benchmarks, but a drop in single core performance of the factor two would be what I expected. I am only concerned about the CPU performance. AS far as I can see the RAID is doing alright, IO is as expected (but might also suffer due to the diminished CPU performance).

When I do a cat /proc/cpuinfo during stress periods, I see that the CPU's are running at 2,2Ghz.

So far what I didn't yet do is test a different operating system. I am going to do that as soon as I get the opportunity to reboot the machine.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • You may want to try a supported operating system. – Greg Askew Sep 02 '18 at 14:10
  • @GregAskew Since Ubuntu 16.04 is a supported operating system, it is not unreasonable to assume that 18.04 should work and just has not been certified yet. – doneal24 Sep 02 '18 at 16:18
  • @DougO'Neal: Ubuntu is not a supported operating system for the HP DL580 G7. RHEL7 and Windows Server 2012 R2 are tested and supported for G7 models. It's right on the HP web site: http://h17007.www1.hpe.com/us/en/enterprise/servers/supportmatrix/redhat_linux.aspx . – Greg Askew Sep 02 '18 at 18:18
  • @GregAskew And when you browse to your linked page, 'Canonical Ubuntu' is the first OS listed in the Support Matrix for Gen 9 & 10 servers. Gen 7 servers are not mentioned on that page under any version of Linux or Windows. – doneal24 Sep 02 '18 at 18:40
  • @GregAskew I didn't think to check on 'archived products'. Yes you're right that only limited OSes of supported on the Gen7 - different with the newer systems. Any reason to think that Ubuntu wouldn't work on these servers since kernel versions can be similar? – doneal24 Sep 02 '18 at 19:00
  • @DougO'Neal: HP makes a good product, but after working with ProLiants for 20 years, I've learned that it's best to stick with their hardware/firmware/drivers. A non-HP driver in a Linux distro may seem to work, but there isn't an assurance it will work properly. Also, there is usually a cutoff for old/dead-end hardware with newer operating systems. This is mostly for practical reasons, newer hardware will have a longer life with a newer OS, and there may not be available older hardware or just too much old hardware to test. – Greg Askew Sep 02 '18 at 20:04
  • Interesting question. Some strange assumptions, but I think this performance can be improved. – ewwhite Sep 03 '18 at 01:46

4 Answers4

4

Check the power management settings in the BIOS. Ensure that they are set to OS controlled. The default HPE BIOS power management settings result in good power usage but poor performance.

Usually this setting can be found in: Power Management > HP Power Regulator > OS Control Mode.

HPE BIOS screenshot

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • Thank you, I think your answer partly hit the nail. What happened is that I tried the os-controlled mode before, but that was with the reportedly faulty bios version 07/01/2013 (see [https://blog.netnerds.net/2014/02/solved-degraded-performance-on-hp-dl580-g7-on-bios-v-7012013/]). This is why that setting didn't have any impact on the system performance. The combination of downgrading the bios and setting it to os-controlled mode (instead of max performance mode, which for some reason also locked the system to 50% performance) worked! – Sebastian_學生 Sep 03 '18 at 04:39
4

I miss seeing questions like this on Server Fault... but at the same time, it's not a common request.

The server is an old architecture. You're making some assumptions that could be leading you down the wrong path.

My recommendations:

You shouldn't only rely on the features recommended in the HP Low-Latency tuning guide. That was meant for specialized applications like algorithmic trading where determinism and predictable resource utilization is the goal. Realtime performance characteristics and low-latency don't necessarily mean faster.

I would look at the CPU's capabilities and work back from there...

Intel Xeon E7-4870
Launched 2011, went end-of-life sometime in 2015.

This is a Turbo Boost-capable CPU. The max turbo frequency is 2.80GHz. When you use a fixed setting like "HP Static High Performance Mode", it actually disables Turbo Boost. In this situation, you'd be better off using the OS Control Mode under your flavor of Linux. (I also recommend this for VMware systems!)

Inside your OS, see if it's possible to install powertop and turbostat.

Monitor one or both during your computational runs.

I'm surprised you're looking at the CPU temperatures or power consumption. I've never used that as a gauge of what the server is doing. What you are missing by using Ubuntu (generally unsupported on HP ProLiant hardware) is the interaction between the OS and the ILO management processor. This is one of the value-adds of ProLiant equipment. With baremetal systems, I try to use a RHEL/CentOS-like OS because of the hardware monitoring and health integration.

Also, go back to the most current BIOS revision. There's rarely a reason to downgrade HP system firmware. Please also ensure your ILO3 firmware is current.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thank you for your advice. The hint about RHEL/CentOS is a good one, I didn't pay too much attention to choose a well-supported linux distribution. The power statistics gave me a clear indication that the system was throttled even though it shouldn't have been, this is what made me suspicious. Even though the problem is solved now I think everything you said is valid and are good points to improve my way of working with this machine. – Sebastian_學生 Sep 03 '18 at 04:55
  • Hmmm...... okay – ewwhite Sep 03 '18 at 06:05
1

XEON E7-4870 CPU contains 10 physical cores. (https://ark.intel.com/products/53579/Intel-Xeon-Processor-E7-4870-30M-Cache-2-40-GHz-6-40-GT-s-Intel-QPI-) This server/CPU configuration only contains 40 cores, not 80 as stated. Is it possible you are confusing cores and threads? Additionally, this CPU can only attain the highest clock rate of 2.8 GHz with 4 active cores. So a total of 16 cores @ 2.8 GHz vs 40 cores at 2.4 GHz.

The DL580 G7 has memory configurations that need to be factored in as well. The highest memory bandwidth achievable on this system (optimized hemisphere mode https://support.hpe.com/hpsc/doc/public/display?docId=c02283239#N100AB ) requires 64 quad-ranked DIMMs.

I know my answer is late to the game, but might help future searchers; and I don't have the reputation for comments.

bsod
  • 111
  • 1
0

A combination of things went wrong and I finally found the solution last night! The combination of downgrading the ROM and setting the power-regulator to os-controlled worked. ** Initially this (second hand) server had the faulty bios 07/01/2013 installed (see [https://blog.netnerds.net/2014/02/solved-degraded-performance-on-hp-dl580-g7-on-bios-v-7012013/]), which made any change of the setting of power regulation on the ROM effectless. ** even though the system was only giving 50% of the performance, a 'cat /proc/cpuinfo' returned a speed of 2,2ghz per core (slightly lower than the max of 2,4, but far away from just 50%). This result was consistent with what other tools like turbostat reported. Very strange indeed and this fact increased my confusion. * I tried to disable c-states on Linux startup but that didn't help (not better, not worse). * Finally I now get 2100 on single core and 36,000 on multicore geekbench. That's about right with disabled hyperthreading. I guess with some patient tweaking it should be possible to get past the 40,000 mark for the multicore geekbench, but for the time being I am satisfied.

What really annoyed me was the fact that all cpu tools reported a core speed of at least 2,2ghz yet my system was painfully slow. I think this must be the odd bug of the 07/01/2013-ROM. I agree that upgrading the ROM/ILO firmware to the latest one would be the best, I didn't have the time to give that a serious try yet.