6

I have some highly floating point intensive processes doing very little I/O. One is called "xspec", which calculates a numerical model and returns a floating point result back to a master process every second (via stdout). It is niced at the 19 level. I have another simple process "cpufloattest" which just does numerical computations in a tight loop. It is not niced.

I have a 4-core i7 system with hyperthreading disabled. I have started 4 of each type of process. Why is the Linux scheduler (Linux 3.4.2) not properly limiting the CPU time taken up by the niced processes?

Cpu(s): 56.2%us,  1.0%sy, 41.8%ni,  0.0%id,  0.0%wa,  0.9%hi,  0.1%si,  0.0%st
Mem:  12297620k total, 12147472k used,   150148k free,   831564k buffers
Swap:  2104508k total,    71172k used,  2033336k free,  4753956k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                         
32399 jss       20   0 44728  32m  772 R 62.7  0.3   4:17.93 cpufloattest                                    
32400 jss       20   0 44728  32m  744 R 53.1  0.3   4:14.17 cpufloattest                                    
32402 jss       20   0 44728  32m  744 R 51.1  0.3   4:14.09 cpufloattest                                    
32398 jss       20   0 44728  32m  744 R 48.8  0.3   4:15.44 cpufloattest                                    
 3989 jss       39  19 1725m 690m 7744 R 44.1  5.8   1459:59 xspec                                           
 3981 jss       39  19 1725m 689m 7744 R 42.1  5.7   1459:34 xspec                                           
 3985 jss       39  19 1725m 689m 7744 R 42.1  5.7   1460:51 xspec                                           
 3993 jss       39  19 1725m 691m 7744 R 38.8  5.8   1458:24 xspec                                           

The scheduler does what I expect if I start 8 of the cpufloattest processes, with 4 of them niced (i.e. 4 with most of the CPU, and 4 with very little)

xioxox
  • 539
  • 5
  • 9
  • xspec *is* using less CPU than cpufloattest. How much less CPU, exactly, were you expecting to be used? – womble Jul 05 '12 at 16:26
  • I'd like a niced 19 process to have minimal impact on a niced zero. If I run 8 cpufloattest, that is the case. The snapshot above is just a snapshot. It's pretty even overall between the nice 0 and nice 19. – xioxox Jul 05 '12 at 16:36

5 Answers5

10

I've discovered what's causing this problem. It's due to the "autogroup" feature of the CFS scheduler. If I do

echo 0 > /proc/sys/kernel/sched_autogroup_enabled 

Then everything behaves as you'd expect. The nice 19 processes drop to near zero CPU usage when nice 0 processes are running.

I'll try to find exactly what the autogrouping is doing to break my usage case and update this answer.

Edit... I chatted to some kernel people on IRC who just said I should disable it if it doesn't work for my workload and that it was just a crazy patch that Linus liked. I'm not sure why autogrouping doesn't like my workload, but this answer is here for people who run into similar problems.

xioxox
  • 539
  • 5
  • 9
7

To add some further detail to the accepted answer... The behavior you are seeing is because of the autogroup feature that was added in Linux 2.6.38 (in 2010). Presumably in the scenario described, the two commands were run in different terminal windows. If they had been run in the same terminal window, then you should have seen the nice value have an effect. The rest of this answer elaborates the story.

The kernel provides a feature known as autogrouping to improve interactive desktop performance in the face of multiprocess, CPU-intensive workloads such as building the Linux kernel with large numbers of parallel build processes (i.e., the make(1) -j flag).

A new autogroup is created when a new session is created via setsid(2); this happens, for example, when a new terminal window is started. A new process created by fork(2) inherits its parent's autogroup membership. Thus, all of the processes in a session are members of the same autogroup.

When autogrouping is enabled (which is the default on many distros), all of the members of an autogroup are placed in the same kernel scheduler "task group". The Linux kernel scheduler employs an algorithm that equalizes the distribution of CPU cycles across task groups. The benefits of this for interactive desktop performance can be described via the following example.

Suppose that there are two autogroups competing for the same CPU (i.e., presume either a single CPU system or the use of taskset(1) to confine all the processes to the same CPU on an SMP system). The first group contains ten CPU-bound processes from a kernel build started with make -j10. The other contains a single CPU-bound process: a video player. The effect of autogrouping is that the two groups will each receive half of the CPU cycles. That is, the video player will receive 50% of the CPU cycles, rather than just 9% of the cycles, which would likely lead to degraded video playback. The situation on an SMP system is more complex, but the general effect is the same: the scheduler distributes CPU cycles across task groups such that an autogroup that contains a large number of CPU-bound processes does not end up hogging CPU cycles at the expense of the other jobs on the system.

The nice value and group scheduling

When scheduling non-real-time processes (e.g., those scheduled under the default SCHED_OTHER policy), the scheduler employs a technique known as "group scheduling", under which threads are scheduled in "task groups". Task groups are formed in the various circumstances, with the relevant case here being autogrouping.

If autogrouping is enabled, then all of the threads that are (implicitly) placed in an autogroup (i.e., the same session, as created by setsid(2)) form a task group. Each new autogroup is thus a separate task group.

Under group scheduling, a thread's nice value has an effect for scheduling decisions only relative to other threads in the same task group. This has some surprising consequences in terms of the traditional semantics of the nice value on UNIX systems. In particular, if autogrouping is enabled, then employing nice(1) on a process has an effect only for scheduling relative to other processes executed in the same session (typically: the same terminal window).

Conversely, for two processes that are (for example) the sole CPU-bound processes in different sessions (e.g., different terminal windows, each of whose jobs are tied to different autogroups), modifying the nice value of the process in one of the sessions has no effect in terms of the scheduler's decisions relative to the process in the other session. This presumably is the scenario you saw, though you don't explicitly mention using two terminal windows.

If you want to prevent autogrouping interfering with the traditional nice behavior as described here, then, as noted in the accepted answer, you can disable the feature

echo 0 > /proc/sys/kernel/sched_autogroup_enabled

Be aware though that this will also have the effect of disabling the benefits for desktop interactivity that the autogroup feature was intended to provide (see above).

The autogroup nice value

A process's autogroup membership can be viewed via the file /proc/[pid]/autogroup:

$ cat /proc/1/autogroup
/autogroup-1 nice 0

This file can also be used to modify the CPU bandwidth allocated to an autogroup. This is done by writing a number in the "nice" range to the file to set the autogroup's nice value. The allowed range is from +19 (low priority) to -20 (high priority).

The autogroup nice setting has the same meaning as the process nice value, but applies to distribution of CPU cycles to the autogroup as a whole, based on the relative nice values of other autogroups. For a process inside an autogroup, the CPU cycles that it receives will be a product of the autogroup's nice value (compared to other autogroups) and the process's nice value (compared to other processes in the same autogroup).

mtk
  • 181
  • 1
  • 4
1

Might not be exactly what you are looking for, but have you tried command cpulimit? It's available least in Debian/Ubuntu repositories.

With cpulimit you can tune how many percent of overall CPU time any process is allowed to take. Another possibility for you might be using cgroups, but cpulimit is more straight-forward and simple to use.

Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78
  • That's an interesting tool - thanks - but it just limits the total cpu percentage used by a process. What I really want is idle scheduling - allowing a process to use all the CPU available providing nothing else is running. I've also tried SCHED_IDLE - doesn't work. – xioxox Jul 06 '12 at 07:31
0

Just reduce xspec to single process, so you will have 4:1 or 3:1 this would run just well.

Andrew Smith
  • 1,123
  • 13
  • 23
0

I think you mean to set the priority higher? In which case you would want to use a negative value..

In your output, cpufloattest has a higher priority than xspec.

EDIT: You can use taskset to set a process to use a specific processor, though, that's not necessary in this case.

siesta
  • 101