4

Have a small server program written in C/C++ that uses nginx and Postgres, currently all are hosted on the same ubuntu system. I usually run the server program from the bash command line.

Recently on the newer versions of Ubuntu, when the server program is running, after about half a minute I am not able to execute any other commands; if the screen saves, then I cannot log back in. Terminating my server program resumes usual behaviour.

In bash: on ENTER for any command:

bash: fork: retry: Resource temporarily unavailable

And the following is written to /var/log/syslog:

Sep  5 09:46:08 ubuntu kernel: [  145.614883] cgroup: fork rejected by pids controller in /user.slice/user-1000.slice/user@1000.service

The experience is different when the server program is started after a system reboot as apposed to after the system has been operational for some time. Specifically after the reboot the sever program will run without issue. If the server program is restarted it will run at 60% utilisation for say half a minute, jump to 80% for about 5 seconds and then drop down to 20% utilisation thereafter. It almost seems like something is directly throttling the application. CGroups?

PAM Configuration: Within /etc/pam.d/common-session has the lines:

session required    pam_unix.so 
session optional    pam_systemd.so 

I don't believe it makes a difference but in /etc/security/limits.d/91-nofile.conf I have set the following PAM settings:

*                soft    nofile          350000
*                hard    nofile          350000
*                soft    nproc           100000
*                hard    nproc           100000
*                soft    sigpending      100000
*                hard    sigpending      100000

CGroups / Systemd Configuration:

myk@ubuntu:/etc/systemd/system$ systemctl status user.slice 
● user.slice - User and Session Slice
     Loaded: loaded (/lib/systemd/system/user.slice; static; vendor preset: ena>
     Active: active since Sat 2020-09-05 10:47:19 +08; 38min ago
       Docs: man:systemd.special(7)
      Tasks: 1396
     Memory: 1.0G
     CGroup: /user.slice
             └─user-1000.slice
               ├─session-2.scope

myk@ubuntu:~$ systemctl status user-1000.slice 
user-1000.slice - User Slice of UID 1000
     Loaded: loaded
    Drop-In: /usr/lib/systemd/system/user-.slice.d
             └─10-defaults.conf
     Active: active since Sat 2020-09-05 09:44:50 +08; 21min ago
       Docs: man:user@.service(5)
      Tasks: 340 (limit: 15479)
     Memory: 1.6G

cat /proc/sys/kernel/threads-max
46907

cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
15479

In /etc/systemd/system.conf tried adding:

DefaultMemoryAccounting=no
DefaultTasksAccounting=no

without success. This did result in 'systemctl status user-1000.slice' no longer displaying a Task limit

In /etc/systemd/logind.conf tried adding:

UserTasksMax=infinity

Without success

In /etc/systemd/system.conf Changed:

#DefaultTasksMax=

To:

DefaultTasksMax=infinity

Without success

Ubuntu is running under VM-Ware hosted by MacOs on a MBP. Pmstat shows thermals on the MBC to be ok. Ubuntu 20.04; vmware 11.6.5; macOs 10.15.6

Question: Is there some way to configure cgroups / pam / systemd / etc to be able to continue to use the command line when the server program is running / be able to log back in on screen save?

myk
  • 171
  • 1
  • 4

2 Answers2

2

Solved this issue by changing /usr/lib/systemd/system/user-.slice.d/10-defaults.conf

Changed:

TasksMax=33%

to read:

TasksMax=infinity

myk
  • 171
  • 1
  • 4
0

I had a similar problem when using Podman. In dmesg, I saw

[265142.704655] cgroup: fork rejected by pids controller in /machine.slice/libpod-89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.scope

When investigating, it turned out that podman sets default task limit to 2048 on my system

# systemctl status libpod-89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.scope
Warning: The unit file, source configuration file or drop-ins of libpod-89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.scope changed>
● libpod-89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.scope - libcontainer container 89834734bc5ab227ef20902dbe60d6082dd95dad81c2a>
   Loaded: loaded (/run/systemd/transient/libpod-89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.scope; transient)
Transient: yes
  Drop-In: /run/systemd/transient/libpod-89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.scope.d
           └─50-DevicePolicy.conf, 50-DeviceAllow.conf, 50-TasksMax.conf
   Active: active (running) since Mon 2021-11-15 16:33:40 CET; 8min ago
    Tasks: 2048 (limit: 2048)
   Memory: 2.0G
      CPU: 2min 40.949s
   CGroup: /machine.slice/libpod-89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.scope
           ├─418998 /usr/sbin/sshd -D
           ├─421644 sshd: root [priv]
           ├─421647 sshd: root@notty
           ├─422342 java -XX:+PrintClassHistogram -XX:+UseG1GC -Xms512M -Xmx2G -Dhawtio.realm=activemq -Dhawtio.offline=true -Dhawtio.rolePrincipalCla>
           ├─422796 bash -c cd /var/dtests/node_data/reproducers/ENTMQCL-2977/aggregate; mvn camel:run
           └─422812 /usr/lib/jvm/java-11-openjdk-11.0.13.0.8-3.el8_5.x86_64/bin/java -classpath /opt/maven/boot/plexus-classworlds-2.5.2.jar -Dclasswo>

Nov 15 16:33:40 dtests-rhel8x-tcn-base systemd[1]: Started libcontainer container 89834734bc5ab227ef20902dbe60d6082dd95dad81c2a3dd860392316bd58dbb.

The podman default limit can be switched off by running podman with --pids-limit=-1, so that's what I am now doing.

user7610
  • 150
  • 7