cgroups
Control groups (or cgroups as they are commonly known) are a feature provided by the Linux kernel to manage, restrict, and audit groups of processes. Compared to other approaches like the nice(1) command or /etc/security/limits.conf
, cgroups are more flexible as they can operate on (sub)sets of processes (possibly with different system users).
Control groups can be accessed with various tools:
- using directives in systemd unit files to specify limits for services and slices;
- by accessing the
cgroup
filesystem directly; - via tools like , and (part of the and packages);
- using the "rules engine daemon" to automatically move certain users/groups/commands to groups (
/etc/cgrules.conf
andcgconfig.service
) (part of the and packages); and - through other software such as Linux Containers (LXC) virtualization.
For Arch Linux, systemd is the preferred and easiest method of invoking and configuring cgroups as it is a part of the default installation.
Installing
Make sure you have one of these packages installed for automated cgroup handling:
- - for controlling resources of a systemd service.
- , - set of standalone tools (, , persistence via ).
With systemd
Hierarchy
Current cgroup hierarchy can be seen with or command.
$ systemctl status
● myarchlinux State: running Jobs: 0 queued Failed: 0 units Since: Wed 2019-12-04 22:16:28 UTC; 1 day 4h ago CGroup: / ├─user.slice │ └─user-1000.slice │ ├─user@1000.service │ │ ├─gnome-shell-wayland.service │ │ │ ├─ 1129 /usr/bin/gnome-shell │ │ ├─gnome-terminal-server.service │ │ │ ├─33519 /usr/lib/gnome-terminal-server │ │ │ ├─37298 fish │ │ │ └─39239 systemctl status │ │ ├─init.scope │ │ │ ├─1066 /usr/lib/systemd/systemd --user │ │ │ └─1067 (sd-pam) │ └─session-2.scope │ ├─1053 gdm-session-worker [pam/gdm-password] │ ├─1078 /usr/bin/gnome-keyring-daemon --daemonize --login │ ├─1082 /usr/lib/gdm-wayland-session /usr/bin/gnome-session │ ├─1086 /usr/lib/gnome-session-binary │ └─3514 /usr/bin/ssh-agent -D -a /run/user/1000/keyring/.ssh ├─init.scope │ └─1 /sbin/init └─system.slice ├─systemd-udevd.service │ └─285 /usr/lib/systemd/systemd-udevd ├─systemd-journald.service │ └─272 /usr/lib/systemd/systemd-journald ├─NetworkManager.service │ └─656 /usr/bin/NetworkManager --no-daemon ├─gdm.service │ └─668 /usr/bin/gdm └─systemd-logind.service └─654 /usr/lib/systemd/systemd-logind
Find cgroup of a process
The cgroup name of a process can be found in .
For example, the cgroup of the shell:
cgroup resource usage
The command can be used to see the resource usage:
Custom cgroups
systemd unit files can be used to define a custom cgroup configuration. They must be placed in a systemd directory, such as /etc/systemd/system/
. The resource control options that can be assigned are documented in .
This is an example slice unit that only allows 30% of one CPU to be used:
Remember to do a daemon-reload to pick up any new or changed files.
Service unit file
Resources can be directly specified in service definition or as a drop-in file:
[Service] MemoryMax=1G
This example limits the service to 1 gigabyte.
Grouping unit under a slice
Service can be specified what slice to run in:
As root
can be used to run a command in a specific slice.
# systemd-run --slice=my.slice command
option can be used to spawn the command as specific user.
# systemd-run --uid=username --slice=my.slice command
The option can be used to spawn a command shell inside the slice.
As unprivileged user
Unprivileged users can divide the resources provided to them into new cgroups, if some conditions are met.
Cgroups v2 must be utilized for a non-root user to be allowed managing cgroup resources.
Controller types
Not all resources can be controlled by user.
Controller | Can be controlled by user | Options |
---|---|---|
cpu | Requires delegation | CPUAccounting, CPUWeight, CPUQuota, AllowedCPUs, AllowedMemoryNodes |
io | Requires delegation | IOWeight, IOReadBandwidthMax, IOWriteBandwidthMax, IODeviceLatencyTargetSec |
memory | MemoryLow, MemoryHigh, MemoryMax, MemorySwapMax | |
pids | TasksMax | |
rdma | ? | |
eBPF | IPAddressDeny, DeviceAllow, DevicePolicy |
User delegation
For user to control cpu and io resources, the resources need to be delegated. This can be done with a drop-in file.
For example if your user id is 1000:
Reboot and verify that the slice your user session is under has cpu and io controller:
User-defined slices
The user slice files can be placed in .
To run the command under certain slice:
$ systemd-run --user --slice=my.slice command
You can also run your login shell inside the slice:
$ systemd-run --user --slice=my.slice --shell
Run-time adjustment
cgroups resources can be adjusted at run-time using command. Option syntax is the same as in .
For example, cutting off internet access for all user sessions:
$ systemctl set-property user.slice IPAddressDeny=any
With libcgroup
You can enable the service with systemd. This allows you to track any errors in more easily.
Ad-hoc groups
One of the powers of cgroups is that you can create "ad-hoc" groups on the fly. You can even grant the privileges to create custom groups to regular users. groupname
is the cgroup name:
# cgcreate -a user -t user -g memory,cpu:groupname
Now all the tunables in the group groupname
are writable by your user:
Cgroups are hierarchical, so you can create as many subgroups as you like. If a normal user wants to run a shell under a new subgroup called foo
:
$ cgcreate -g memory,cpu:groupname/foo $ cgexec -g memory,cpu:groupname/foo bash
To make sure (only meaningful for legacy (v1) cgroups):
A new subdirectory was created for this group. To limit the memory usage of all processes in this group to 10 MB, run the following:
$ echo 10000000 > /sys/fs/cgroup/memory/groupname/foo/memory.limit_in_bytes
Note that the memory limit applies to RAM use only -- once tasks hit this limit, they will begin to swap. But it will not affect the performance of other processes significantly.
Similarly you can change the CPU priority ("shares") of this group. By default all groups have 1024 shares. A group with 100 shares will get a ~10% portion of the CPU time:
$ echo 100 > /sys/fs/cgroup/cpu/groupname/foo/cpu.shares
You can find more tunables or statistics by listing the cgroup directory.
You can also change the cgroup of already running processes. To move all 'bash' commands to this group:
$ pidof bash 13244 13266 $ cgclassify -g memory,cpu:groupname/foo `pidof bash` $ cat /proc/13244/cgroup 11:memory:/groupname/foo 6:cpu:/groupname/foo
Persistent group configuration
If you want your cgroups to be created at boot, you can define them in instead. For example, the "groupname" has a permission for and users of group to manage limits and add tasks. A subgroup "groupname/foo" group definitions would look like this:
- Comments should begin at the start of a line! The # character for comments must appear as the first character of a line. Else, cgconfigparser will have problem parsing it but will only report
cgroup change of group failed
as the error, unless you started cgconfig with Systemd - The permissions section is optional.
- The
/sys/fs/cgroup/
hierarchy directory containing all controllers sub-directories is already created and mounted at boot as a virtual file system. This gives the ability to create a new group entry with the$CONTROLLER-NAME { }
command. If for any reason you want to create and mount hierachies in another place, you will then need to write a second entry in/etc/cgconfig.conf
following this way :
mount { cpuset = /your/path/groupname; }
This is equivalent to these shell commands:
# mkdir /your/path/groupname # mount -t /your/path -o cpuset groupname /your/path/groupname
With the cgroup virtual filesystem
Starting with systemd 232, the cgm method described in the next section, this section will instead describe a manual method to limit memory usage.
Create a new cgroup named groupname:
# mkdir /sys/fs/cgroup/memory/groupname
Example: set the maximum memory limit to 100MB:
# echo 100000000 > /sys/fs/cgroup/memory/groupname/memory.limit_in_bytes
Move a process to the cgroup (note: only one PID can be written at a time, repeat this for each process that must be moved):
# echo pid > /sys/fs/cgroup/memory/groupname/cgroup.procs
Examples
Matlab
Doing large calculations in MATLAB can crash your system, because Matlab does not have any protection against taking all your machine's memory or CPU. The following examples show a cgroup that constrains Matlab to first 6 CPU cores and 5 GB of memory.
With systemd
Launch Matlab like this (be sure to use the right path):
$ systemd-run --user --slice=matlab.slice /opt/MATLAB/2012b/bin/matlab -desktop
With libcgroup
Change to the user Matlab is run as.
You can also restrict the CPU share with the cpu
constraint.
Launch Matlab like this (be sure to use the right path):
$ cgexec -g memory,cpuset:matlab /opt/MATLAB/2012b/bin/matlab -desktop
Documentation
- For information on controllers and what certain switches and tunables mean, refer to kernel's documentation v1 or v2 (or install and see )
- A detailed and complete Resource Management Guide can be found in the fedora project documentation.
For commands and configuration files, see relevant man pages, e.g. or
Tips and tricks
Enable cgroup v1
Cgroup v2 is now enabled by default. If you want to switch to cgroup v1 instead, you need to set the following kernel parameter:
systemd.unified_cgroup_hierarchy=0