42

Hi Linux/UNIX Overlords,

Do any of you have a rule of thumb as to how many context switches (per processor core) is Normal on a Linux server?

My college here brought it up, and he's seeing 16K on a 8-core x86_64 machine.

Here are some stats from sarface over the last few days...

alt text http://src.autonomy.net.au/imagebin/81895e338fae67d3d205c09db44a81e6-Picture_10.png

And to see the process creation stats, here's a logarithmic view of the same graph...

alt text http://src.autonomy.net.au/imagebin/7481f7e52bead4effc90248fc23c72fe-Picture_11.png

And the 8 cores are bored to death...

alt text http://src.autonomy.net.au/imagebin/0e94326652e977fd74edcd840f94200f-Picture_12.png

CS vs IOwait (x10000 scale)

alt text http://src.autonomy.net.au/imagebin/a52a2a8a120394849c0da4045933e306-Picture_13.png

More useless information in case anyone asks..

  • The storage that the server works on is a 0.5TB SAN via FC
  • There's 8GB of RAM, mostly cache - no swapping.
Xerxes
  • 4,133
  • 3
  • 26
  • 33

5 Answers5

27

This depends very much on the type of application you run. If you've got applications which are very trigger-happy WRT syscalls you can expect to see high amounts of context switching. If most of your applications idle around and only wake up when there's stuff happening on a socket, you can expect to see low context switch rates.

System calls

System calls cause context switches by their very own nature. When a process does a system call, it basically tells the kernel to take over from it's current point in time and memory to do stuff the process isn't privileged to do, and return to the same spot when it's done.

When we look at the definition of the write(2) syscall from Linux, this becomes very clear:

NAME
       write - write to a file descriptor

SYNOPSIS
       #include 

       ssize_t write(int fd, const void *buf, size_t count);

DESCRIPTION
       write() writes up to count bytes from the buffer pointed buf to the file
       referred to by the file descriptor fd. [..]

RETURN VALUE
       On success, the  number of bytes written is returned (zero indicates
       nothing was written). On error, -1 is returned, and errno is set
       appropriately.
       [..]

This basically tells the kernel to take over operation from the process, move up to count bytes, starting from the memory address pointed at by *buf to file descriptor fd of the current process and then return back to the process and tell him how it went.

A nice example to show this is the dedicated game server for Valve Source based games, hlds. http://nopaste.narf.at/f1b22dbc9 shows one second worth of syscalls done by a single instance of a game server which had no players on it. This process takes about 3% CPU time on a Xeon X3220 (2.4Ghz), just to give you a feeling for how expensive this is.

Multi-Tasking

Another source of context switching might be processes which don't do syscalls, but need to get moved off a given CPU to make room for other processes.

A nice way to visualize this is cpuburn. cpuburn doesn't do any syscalls itself, it just iterates over it's own memory, so it shouldn't cause any context switching.

Take an idle machine, start vmstat and then run a burnMMX (or any different test from the cpuburn package) for every CPU core the system has. You should have full system utilization by then but hardly any increased context switching. Then try to start a few more processes. You'll see that the context switching rate increases as the processes begin to compete over CPU cores. The amount of switching depends on the processes/core ratio and the multitasking resolution of your kernel.

Further reading

linfo.org has a nice writeup on what context switches and system calls are. Wikipedia has generic information and a nice link collection on System calls.

Michael Renner
  • 1,750
  • 13
  • 17
  • 1
    This has been useful - you've given me a great idea! =) – Xerxes May 31 '09 at 03:58
  • 1
    Your statement `System calls cause context switches by their very own nature` seems wrong. System calls cause mode switch as stated by http://www.linfo.org/context_switch.html – Nicolas Labrot Apr 13 '18 at 16:15
7

my moderately loaded webserver sits at around 100-150 switches a second most of the time with peaks into the thousands.

High context switch rates are not themselves an issue, but they may point the way to a more significant problem.

edit: Context switches are a symptom, not a cause. What are you trying to run on the server? If you have a multiprocessor machine, you may want to try setting cpu affinity for your main server processes.

Alternatively if you are running X, try dropping down into console mode.

edit again: at 16k cs per second, each cpu is averaging two switches per millisecond - that is half to a sixth of the normal timeslice. Could he be running a lot of IO bound threads?

edit again post graphs: Certainly looks IO bound. is the system spending most of its time in SYS when the context switches are high?

edit once more: High iowait and system in that last graph - completely eclipsing the userspace. You have IO problems.
What FC card are you using?

edit: hmmm. any chance of getting some benchmarks going on your SAN access with bonnie++ or dbench during deadtime? I would be interested in seeing if they have similar results.

edit: Been thinking about this over the weekend and I've seen similar usage patters when bonnie is doing the "write a byte at a time" pass. That may explain the large amount of switching going on, as each write would require a separate syscall.

jay_dubya
  • 206
  • 1
  • 3
  • I'm still not convinced that a high context-switch rate is not a problem, I'm talking about high as in 4K to 16K, not 100-150. – Xerxes May 29 '09 at 05:11
  • None of our servers run any X. I agree with you on the IO wait problem, and the relationship between that and the CS. The HBA card is not a suspect though because we use the same card on the other hundred or so servers... Conclusion is that I blame the SAN teams crappy EVA SAN that they desperately try and defend all the time. Note that a high IO-wait is not *always* reason to be alarmed, if most processes on a machine are IO-bound, it's expected that the server will have nothing better to do that idle spins. – Xerxes May 29 '09 at 07:31
  • On second though - the 4th graph attached shows that it's not really as close as I though at first. Not exactly an eclipse by any means. I still blame the SAN though. =) – Xerxes May 29 '09 at 07:38
1

I'm more inclined to concern about the CPU occupancy rate of the system state. If it's close to 10% or higher, that means your OS is spending too much time doing the context switches.Although move some processes to another machine is much slower,it deserves to do so.

1

Things like this are why you should try and keep performance baselines for your servers. That way, you can compare things you notice all of a sudden with things you have recorded in the past.

That said, I have servers running (not very busy Oracle servers, mainly), which are steady around 2k with some 4k peaks. For my servers, that is normal, for other people's servers that might be way too low or too high.

How far can you go back in your data?

What kind of CPU information can you give us?

wzzrd
  • 10,269
  • 2
  • 32
  • 47
  • I definitely agree with keeping a baseline, and we have nagios data going back for long periods - the problem with this server is that it's new blood - only been around for a short while. In addition, it's running enterprise (read: crap) software - Teamsite - just to add to the undefined-variable list. I still prefer sar (personal preference) so I'll configure it to keep more than the default (2-week), and see how it goes. – Xerxes May 29 '09 at 08:42
  • Using sar in combination with rrdtool (which it looks like your graphs come from) can be an easy means of keeping your data (or at least abstracts of it) for a long time. – wzzrd May 29 '09 at 09:45
-1

There's no rule of thumb. A context switch is just the CPU moving from processing one thread to another. If you run lots of processes (or a few highly threaded ones) you'll see more switches. Luckily, you don't need to worry about how many context switches there are -- the cost is small and more or less unavoidable.

Alex J
  • 2,804
  • 2
  • 21
  • 24
  • 6
    Actually the cost of a context switch is **expensive**. This is even worst on Virtual machines - we did some testing a few months ago that showed that one of the biggest causes of VM performance was context-switching. – Xerxes May 29 '09 at 02:13
  • In fact, in any modern (multi-tasking) operating system, the minimization of context-switching is a very significant optimization task. Do you have any sources to back up your claim that the cost is small? – Xerxes May 29 '09 at 02:23
  • Sorry, are you talking about minimising context switches from the perspective of OS development? Having nothing to do with such development I have no opinion on the benefits of designing a system to minimise CS :) If you are talking about minimising context switches on a server, the issue is mitigating context switches introduces latency in other places. EG reducing the number of processes on a machine means you have to move these processes to another machine, which means communication occurs over a network, which is *much* slower! – Alex J May 29 '09 at 03:02
  • I believe your definition of context switches is flawed; they also happen when a system call is performed, even if it returns to the same thread. Applications optimize against this by doing various tricks. For example Apache needs to get system time very often; for that purpose a thread calls localtime repeatedly and stores the result in shared memory. The other threads only have to read from RAM and do not incur a process switch when doing so. – niXar May 29 '09 at 16:26