5

tl;dr: First CPU core is consistently saturated, all other cores are consistently under-loaded.

A VM, inside Ubuntu-based Xen XCP:

$ uname -a
Linux MYHOST 2.6.38-15-virtual #59-Ubuntu SMP Fri Apr 27 16:40:18 UTC 2012 i686 i686 i386 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 11.04
Release:    11.04
Codename:   natty

This VM has 8 CPU cores.

There are 10 single-threaded worker processes running on this VM, which are connected via FCGI interface to the nginx server (listening on a local network port).

Under synthetic load from AB, only first core of eight is ever loaded to 100% (as seen from htop). It remains under very high load more or less constantly, and all other cores are loaded anywhere from 0 to 100%, more or less randomly (and CPU load of these cores is jumping around).

Here is what I typically see under load in htop:

  1  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||99.3%]     Tasks: 70, 35 thr; 11 running
  2  [|||||||||||||||                                                                       15.0%]     Load average: 3.86 1.05 0.39 
  3  [|||||||||||||||||||||||||||||||||||                                                   36.7%]     Uptime: 22 days, 06:31:57
  4  [||||||||||||||||                                                                      15.7%]
  5  [|||||||||||||||||||||                                                                 22.4%]
  6  [|||||||||||||||||||                                                                   19.9%]
  7  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                    71.2%]
  8  [||||||||||||||||||||||||||||||                                                        31.3%]
  Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||394/4028MB]
  Swp[                                                                                   0/5362MB]

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
26213 www-data  20   0 49748 26952  2448 R 29.0  0.7 10:42.61 /usr/bin/luajit2
26227 www-data  20   0 50172 27412  2452 R 27.0  0.7 10:43.53 /usr/bin/luajit2
26221 www-data  20   0 50736 27948  2452 R 27.0  0.7 10:39.02 /usr/bin/luajit2
26234 www-data  20   0 50128 27232  2452 R 27.0  0.7 10:36.36 /usr/bin/luajit2
26218 www-data  20   0 50232 27376  2452 R 26.0  0.7 10:39.32 /usr/bin/luajit2
26214 www-data  20   0 51268 28496  2452 R 26.0  0.7 10:58.15 /usr/bin/luajit2
26232 www-data  20   0 50420 27588  2452 R 25.0  0.7 10:39.21 /usr/bin/luajit2
26217 www-data  20   0 50236 27348  2452 R 25.0  0.7 10:34.44 /usr/bin/luajit2
26219 www-data  20   0 50748 27960  2448 R 23.0  0.7 10:45.30 /usr/bin/luajit2
26239 www-data  20   0 49772 27188  2452 R 22.0  0.7 10:39.39 /usr/bin/luajit2
26368 www-data  20   0 10856  3796   968 S 15.0  0.1  1:12.62 nginx: worker process
26369 www-data  20   0 10652  3504   968 S  2.0  0.1  1:12.75 nginx: worker process
26372 www-data  20   0 10520  3504   968 S  0.0  0.1  1:18.64 nginx: worker process
...

During the load test all worker processes are in R, load test runs for about 10-15 minutes (and performance is about 700-900 hits/second). Traffic is, of course, generated from external machines.

Looks like this CPU core load disbalance is the main performance bottleneck, and if all cores were loaded evenly, performance could be higher.

Any clues on how to troubleshoot this issue?

Please tell me if I can provide more information.

Alexander Gladysh
  • 2,343
  • 7
  • 30
  • 47
  • I don't believe that scheduling is the issue here. Even the optimial scheduling algorithm probably wouldn't give you much better results. What kind of requests did you send to the server (static html, php?), because 700-900 requests/second doesn't look good for such server. Can you paste commands that you used for testing? – FINESEC Nov 06 '12 at 20:24
  • Well, all cores except the first one are consistently under-loaded, and first core is consistently over-loaded. Looks fishy to me. – Alexander Gladysh Nov 06 '12 at 20:28
  • Worker processes are Lua-based, and do some non-trivial business logic (including communication over network with other machines), so 800 rps is more or less OK (but there is some room for optimization). That being said, please note that this question is specifically about uneven CPU core utilization. – Alexander Gladysh Nov 06 '12 at 20:33
  • Testing is done by running `ab -n 100000 -q -c 20 http://my.example.com`, from six different external machines. – Alexander Gladysh Nov 06 '12 at 20:37
  • Do you have the same problem when serving static html pages? I don't believe that you could tell why scheduler works this way without knowing what's inside your business logic - my guess would be synchronization algorithms. – FINESEC Nov 06 '12 at 20:44
  • Also, see what happens when testing from one box only. – FINESEC Nov 06 '12 at 20:45
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/6355/discussion-between-finesec-and-alexander-gladysh) – FINESEC Nov 06 '12 at 20:48
  • It is 1am here, sorry. Maybe tomorrow. Meanwhile I've updated the question with `htop` screenshot. – Alexander Gladysh Nov 06 '12 at 21:00
  • +1 from me for posting a nice concise question here. We don't get many of those anymore. – Brent Pabst Nov 06 '12 at 21:02
  • @FINESEC: I'll check static pages and will update ticket. As for business logic — I believe that there should be tools to figure out what is going on without delving into the code. It is rather complex anyway. (That being said, it is single-threaded and no explicit synchronization stuff is used.) – Alexander Gladysh Nov 06 '12 at 21:26
  • I wonder if interrupts are going to that first core. – David Schwartz Nov 06 '12 at 23:09
  • @DavidSchwartz: How to check that? – Alexander Gladysh Nov 07 '12 at 04:52

2 Answers2

5

It looks like CPU0 receives every eth1 interrupt, and there are a lot of them.

$ cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
283:  113720624          0          0          0          0          0          0          0   xen-dyn-event     eth1
284:          1          0          0          0          0          0          0          0   xen-dyn-event     eth0
285:       2254          0          0    3873799          0          0          0          0   xen-dyn-event     blkif
286:         23          0          0          0          0          0          0          0   xen-dyn-event     hvc_console
287:        492         42          0          0          0          0          0     295324   xen-dyn-event     xenbus
288:          0          0          0          0          0          0          0     222294  xen-percpu-ipi       callfuncsingle7
289:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug7
290:          0          0          0          0          0          0          0     151302  xen-percpu-ipi       callfunc7
291:          0          0          0          0          0          0          0    3236015  xen-percpu-ipi       resched7
292:          0          0          0          0          0          0          0      60064  xen-percpu-ipi       spinlock7
293:          0          0          0          0          0          0          0   12355510  xen-percpu-virq      timer7
294:          0          0          0          0          0          0     803174          0  xen-percpu-ipi       callfuncsingle6
295:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug6
296:          0          0          0          0          0          0      60027          0  xen-percpu-ipi       callfunc6
297:          0          0          0          0          0          0    5374762          0  xen-percpu-ipi       resched6
298:          0          0          0          0          0          0      64976          0  xen-percpu-ipi       spinlock6
299:          0          0          0          0          0          0   15294870          0  xen-percpu-virq      timer6
300:          0          0          0          0          0     264441          0          0  xen-percpu-ipi       callfuncsingle5
301:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug5
302:          0          0          0          0          0      79324          0          0  xen-percpu-ipi       callfunc5
303:          0          0          0          0          0    3468144          0          0  xen-percpu-ipi       resched5
304:          0          0          0          0          0      66269          0          0  xen-percpu-ipi       spinlock5
305:          0          0          0          0          0   12778464          0          0  xen-percpu-virq      timer5
306:          0          0          0          0     844591          0          0          0  xen-percpu-ipi       callfuncsingle4
307:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug4
308:          0          0          0          0      75293          0          0          0  xen-percpu-ipi       callfunc4
309:          0          0          0          0    3482146          0          0          0  xen-percpu-ipi       resched4
310:          0          0          0          0      79312          0          0          0  xen-percpu-ipi       spinlock4
311:          0          0          0          0   21642424          0          0          0  xen-percpu-virq      timer4
312:          0          0          0     449141          0          0          0          0  xen-percpu-ipi       callfuncsingle3
313:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug3
314:          0          0          0      95405          0          0          0          0  xen-percpu-ipi       callfunc3
315:          0          0          0    3802992          0          0          0          0  xen-percpu-ipi       resched3
316:          0          0          0      76607          0          0          0          0  xen-percpu-ipi       spinlock3
317:          0          0          0   16439729          0          0          0          0  xen-percpu-virq      timer3
318:          0          0     876383          0          0          0          0          0  xen-percpu-ipi       callfuncsingle2
319:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug2
320:          0          0      76416          0          0          0          0          0  xen-percpu-ipi       callfunc2
321:          0          0    3422476          0          0          0          0          0  xen-percpu-ipi       resched2
322:          0          0      69217          0          0          0          0          0  xen-percpu-ipi       spinlock2
323:          0          0   10247182          0          0          0          0          0  xen-percpu-virq      timer2
324:          0     393514          0          0          0          0          0          0  xen-percpu-ipi       callfuncsingle1
325:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug1
326:          0      95773          0          0          0          0          0          0  xen-percpu-ipi       callfunc1
327:          0    3551629          0          0          0          0          0          0  xen-percpu-ipi       resched1
328:          0      77823          0          0          0          0          0          0  xen-percpu-ipi       spinlock1
329:          0   13784021          0          0          0          0          0          0  xen-percpu-virq      timer1
330:     730435          0          0          0          0          0          0          0  xen-percpu-ipi       callfuncsingle0
331:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug0
332:      39649          0          0          0          0          0          0          0  xen-percpu-ipi       callfunc0
333:    3607120          0          0          0          0          0          0          0  xen-percpu-ipi       resched0
334:     348740          0          0          0          0          0          0          0  xen-percpu-ipi       spinlock0
335:   89912004          0          0          0          0          0          0          0  xen-percpu-virq      timer0
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:          0          0          0          0          0          0          0          0   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RES:    3607120    3551629    3422476    3802992    3482146    3468144    5374762    3236015   Rescheduling interrupts
CAL:     770084     489287     952799     544546     919884     343765     863201     373596   Function call interrupts
TLB:          0          0          0          0          0          0          0          0   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          0          0          0          0          0          0          0          0   Machine check polls
ERR:          0
MIS:          0
Alexander Gladysh
  • 2,343
  • 7
  • 30
  • 47
-3

Umm,

Why have you not mentioned taskset ? taskset -p pid will retrieve the affinity...

Add -c to taskset to specify a cpulist : in this case anything but 0.

ArrowInTree
  • 154
  • 6