0

I am testing G-Wan 4.3.14 on CentOS 6 with the 2.6.32 Linux kernel using an Opteron 6234 6 module / 12 core processor.

Running a simple weighttp test I get:

weighttp -k -n 1000000 -t 6 -c 1000 localhost:8080

finished in 7 sec, 250 millisec and 896 microsec, 137913 req/s, 1044186 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 7753000286 bytes total, 256000286 bytes http, 7497000000 bytes data

This seems abnormally low. Does anyone have any experience/advice for tuning G-Wan or other HTTP servers on Opteron?

2 Answers2

1

using an [AMD] Opteron 6234 6 module / 12 core processor

This score for the 6-Core AMD Opteron @ 2.4GHz[1] 137,913 req/s falls short of our 850,000 req/s on an Intel 6-Core Xeon W3680 @ 3.33GHz[2] (with a 100-byte static file).

Besides the differences of each architure performance*, the problem for G-WAN comes on AMD CPUs from the fact that we did not have access to any of those CPUs (all our machines are equipped with Intel CPUs).

Thanks to recent AMD user reports, we have found that the number of detected CPU Cores for AMD CPUs is twice the actual number. This is due to the fact that AMD has its own set of CPUID codes and return values - which differ from Intel's.

This AMD CPU Core mis-detection leads to obvious CPU cache conflicts - the problems supposed to be resolved by G-WAN.

For now, by using ./gwan -w 6 you can force any given multicore setting, bypassing the G-WAN automatic detection when needed.

In your case, your should be using 6 physical CPU Cores rather than the 12 wrongly used by G-WAN. This is what you can do right now (and you will most probably get much higher results with your benchamrks by just doing that).

We will issue an AMD workaround in the next release to make sure that no more manual tweaking is needed.

[*] References:

[1] http://www.cpubenchmark.net/cpu.php?cpu=AMD+Opteron+6234

[2] http://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+W3680+%40+3.33GHz

Gil
  • 307
  • 3
  • 12
  • Running ./gwan -w 6 for the same test yielded this result: weighttp -c 1000 -k -n 10000000 -t 6 localhost:8080 finished in 45 sec, 344 millisec and 501 microsec, 220533 req/s, 1669725 kbyte/s requests: 10000000 total, 10000000 started, 10000000 done, 10000000 succeeded, 0 failed, 0 errored status codes: 10000000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 77530000351 bytes total, 2560000351 bytes http, 74970000000 bytes data – Ersun Warncke May 09 '13 at 03:36
  • Twice better, not bad. But what is the size of the static file fetched in the test? Keep in mind that anything greater than a few bytes will spend more time in the kernel than in the userland server application that you are trying to test. – Gil May 12 '13 at 08:17
0

It's just a guess, and so i may be completely wrong...but Opteron is a NUMA architecture.

Sometimes programs are optimized for non-NUMA (very common) architectures, and then the performance is low in NUMA environments.

To test this, you can run exactly the same version of G-Wan with the same data (or almost it !) in a Phenon or i7 that are comparable with your Opteron !

Great..i'm trying to help and have -2 votes...amazing !

guipy
  • 37
  • 1
  • 1
  • 4
  • 2
    All of those are NUMA architectures. Though NUMA is only actually in use if you have more than one physical processor. – Michael Hampton May 08 '13 at 15:50
  • This is applicable. An Opteron 12/16 core is 2 cpus on 1 package. So it really is a 2P system and all the same rules apply. Cache/memory latency is very bad unless you are accessing the memory local to the CPU you are running on. – Ersun Warncke May 09 '13 at 03:40
  • Hmmm...i7 and Phenon can be (and generally are) used as SMP! Or am i wrong ? – guipy May 09 '13 at 03:41