How many Context Switches is "normal" (as a function of CPU cores (or other))?

Question

Hi Linux/UNIX Overlords,

Do any of you have a rule of thumb as to how many context switches (per processor core) is Normal on a Linux server?

My college here brought it up, and he's seeing 16K on a 8-core x86_64 machine.

Here are some stats from sarface over the last few days...

alt text http://src.autonomy.net.au/imagebin/81895e338fae67d3d205c09db44a81e6-Picture_10.png

And to see the process creation stats, here's a logarithmic view of the same graph...

alt text http://src.autonomy.net.au/imagebin/7481f7e52bead4effc90248fc23c72fe-Picture_11.png

And the 8 cores are bored to death...

alt text http://src.autonomy.net.au/imagebin/0e94326652e977fd74edcd840f94200f-Picture_12.png

CS vs IOwait (x10000 scale)

alt text http://src.autonomy.net.au/imagebin/a52a2a8a120394849c0da4045933e306-Picture_13.png

More useless information in case anyone asks..

The storage that the server works on is a 0.5TB SAN via FC
There's 8GB of RAM, mostly cache - no swapping.

Hi Antoine - The graphs are made from sarface (http://projects.autonomy.net.au/sarface) — Xerxes, May 29 '09 at 06:55
the graph links are dead as of now. @Xerxes can you get there from somewhere? — törzsmókus, Nov 15 '18 at 14:18

score 27 · Answer 1 · answered May 31 '09 at 02:58

This depends very much on the type of application you run. If you've got applications which are very trigger-happy WRT syscalls you can expect to see high amounts of context switching. If most of your applications idle around and only wake up when there's stuff happening on a socket, you can expect to see low context switch rates.

System calls

System calls cause context switches by their very own nature. When a process does a system call, it basically tells the kernel to take over from it's current point in time and memory to do stuff the process isn't privileged to do, and return to the same spot when it's done.

When we look at the definition of the write(2) syscall from Linux, this becomes very clear:

NAME
       write - write to a file descriptor

SYNOPSIS
       #include 

       ssize_t write(int fd, const void *buf, size_t count);

DESCRIPTION
       write() writes up to count bytes from the buffer pointed buf to the file
       referred to by the file descriptor fd. [..]

RETURN VALUE
       On success, the  number of bytes written is returned (zero indicates
       nothing was written). On error, -1 is returned, and errno is set
       appropriately.
       [..]

This basically tells the kernel to take over operation from the process, move up to count bytes, starting from the memory address pointed at by *buf to file descriptor fd of the current process and then return back to the process and tell him how it went.

A nice example to show this is the dedicated game server for Valve Source based games, hlds. http://nopaste.narf.at/f1b22dbc9 shows one second worth of syscalls done by a single instance of a game server which had no players on it. This process takes about 3% CPU time on a Xeon X3220 (2.4Ghz), just to give you a feeling for how expensive this is.

Multi-Tasking

Another source of context switching might be processes which don't do syscalls, but need to get moved off a given CPU to make room for other processes.

A nice way to visualize this is cpuburn. cpuburn doesn't do any syscalls itself, it just iterates over it's own memory, so it shouldn't cause any context switching.

Take an idle machine, start vmstat and then run a burnMMX (or any different test from the cpuburn package) for every CPU core the system has. You should have full system utilization by then but hardly any increased context switching. Then try to start a few more processes. You'll see that the context switching rate increases as the processes begin to compete over CPU cores. The amount of switching depends on the processes/core ratio and the multitasking resolution of your kernel.

I'm still not convinced that a high context-switch rate is not a problem, I'm talking about high as in 4K to 16K, not 100-150. — Xerxes, May 29 '09 at 05:11
None of our servers run any X. I agree with you on the IO wait problem, and the relationship between that and the CS. The HBA card is not a suspect though because we use the same card on the other hundred or so servers... Conclusion is that I blame the SAN teams crappy EVA SAN that they desperately try and defend all the time. Note that a high IO-wait is not *always* reason to be alarmed, if most processes on a machine are IO-bound, it's expected that the server will have nothing better to do that idle spins. — Xerxes, May 29 '09 at 07:31
On second though - the 4th graph attached shows that it's not really as close as I though at first. Not exactly an eclipse by any means. I still blame the SAN though. =) — Xerxes, May 29 '09 at 07:38

score 1 · Answer 3 · answered May 29 '09 at 06:23

I'm more inclined to concern about the CPU occupancy rate of the system state. If it's close to 10% or higher, that means your OS is spending too much time doing the context switches.Although move some processes to another machine is much slower,it deserves to do so.

score 1 · Answer 4 · answered May 29 '09 at 08:17

1

Things like this are why you should try and keep performance baselines for your servers. That way, you can compare things you notice all of a sudden with things you have recorded in the past.

That said, I have servers running (not very busy Oracle servers, mainly), which are steady around 2k with some 4k peaks. For my servers, that is normal, for other people's servers that might be way too low or too high.

How far can you go back in your data?

What kind of CPU information can you give us?

answered May 29 '09 at 08:17

wzzrd

10,269
2
32
47

I definitely agree with keeping a baseline, and we have nagios data going back for long periods - the problem with this server is that it's new blood - only been around for a short while. In addition, it's running enterprise (read: crap) software - Teamsite - just to add to the undefined-variable list. I still prefer sar (personal preference) so I'll configure it to keep more than the default (2-week), and see how it goes. – Xerxes May 29 '09 at 08:42
Using sar in combination with rrdtool (which it looks like your graphs come from) can be an easy means of keeping your data (or at least abstracts of it) for a long time. – wzzrd May 29 '09 at 09:45

score -1 · Answer 5 · answered May 29 '09 at 02:02

-1

There's no rule of thumb. A context switch is just the CPU moving from processing one thread to another. If you run lots of processes (or a few highly threaded ones) you'll see more switches. Luckily, you don't need to worry about how many context switches there are -- the cost is small and more or less unavoidable.

answered May 29 '09 at 02:02

Alex J

2,804
2
21
24

6

Actually the cost of a context switch is **expensive**. This is even worst on Virtual machines - we did some testing a few months ago that showed that one of the biggest causes of VM performance was context-switching. – Xerxes May 29 '09 at 02:13
In fact, in any modern (multi-tasking) operating system, the minimization of context-switching is a very significant optimization task. Do you have any sources to back up your claim that the cost is small? – Xerxes May 29 '09 at 02:23
Sorry, are you talking about minimising context switches from the perspective of OS development? Having nothing to do with such development I have no opinion on the benefits of designing a system to minimise CS :) If you are talking about minimising context switches on a server, the issue is mitigating context switches introduces latency in other places. EG reducing the number of processes on a machine means you have to move these processes to another machine, which means communication occurs over a network, which is *much* slower! – Alex J May 29 '09 at 03:02
I believe your definition of context switches is flawed; they also happen when a system call is performed, even if it returns to the same thread. Applications optimize against this by doing various tricks. For example Apache needs to get system time very often; for that purpose a thread calls localtime repeatedly and stores the result in shared memory. The other threads only have to read from RAM and do not incur a process switch when doing so. – niXar May 29 '09 at 16:26

How many Context Switches is "normal" (as a function of CPU cores (or other))?

5 Answers5

System calls

Multi-Tasking

Further reading