How does systemd put sshd processes in slices?

Question

I'm diagnosing an SSH bastion I manage. This machine has about 5500 SSH connections with port forwarding at any given point in time.

Recently, I ran into an issue where SSH connections where refused because the user slice that holds all these sshd processes ran in to the MaxTasks limit.

This was new to me and during diagnosing, I noticed that the user.slice does not hold all sshd processes as I thought it would. About half of them (not exact) are held by system.slice. At first I though that might've been the root processes, with the user-specific processes (privilege separation) being held by the user.slice. However, this is not the case. It appears to be random.

I did notice the processes held by the user.slice are nicely separated per session, whereas the ones held by system.slice are just held under ssh.service with no further separation.

# systemd-cgls
[...]
│ ├─user-1031.slice
│ │ ├─session-719.scope
│ │ │ ├─5559 sshd: <user> [priv]
│ │ │ └─6224 sshd: <user>
│ │ ├─session-617.scope
│ │ │ ├─4963 sshd: <user> [priv]
│ │ │ └─5392 sshd: <user>
│ │ ├─session-515.scope
│ │ │ ├─3862 sshd: <user> [priv]
│ │ │ └─4693 sshd: <user>
│ │ ├─session-413.scope
│ │ │ ├─3049 sshd: <user> [priv]
│ │ │ └─3988 sshd: <user>
[...]
└─system.slice
  ├─ssh.service
  │ ├─  338 sshd: <user> [priv]
  │ ├─  352 sshd: <user>
  │ ├─  353 sshd: <user>
  │ ├─  358 sshd: <user>
  │ ├─  385 sshd: <user> [priv]
  │ ├─  391 sshd: <user>
  │ ├─  392 sshd: <user>
[...]

How does systemd decide to put a process in one slice or the other?
Do they get moved?
Is there a way to accurately and reliably put all these sessions under the appropriate user.slice so I can manage the limitations set for the number of processes allowed?

score 1 · Answer 1 · answered May 26 '19 at 13:08

OpenSSH privilege separation is implemented with a privileged and unprivileged process per connection.

Per user slicing is a feature of systemd-logind.service driven by pam_systemd. Unclear to me as to why you have a bunch still in systemd.slice. Perhaps those use the PAM stack differently.

A single user slice for 5500 SSH connections? More than typical for one user, but you can do that.

I suggest setting pids.max very high, but not infinite, on the user slices. In excess of twice the number of connections you expect. To do that, create /etc/systemd/logind.conf.d/local.conf and customize:

[Login]
UserTasksMax=16000

If ssh.service has more than a couple thousand tasks under it, also consider upping its limits. This time, using the common resource control directives, so the drop in customization is at /etc/systemd/system/ssh.service.d/local.conf

[Service] 
TasksMax=16000

Thank you for your answer. l'm still having trouble with the systemd cgroup allocation. The main question here was why not all sshd processes are put into the correct slice. I looked into a bit further now and I've notices that not all sshd processes are even part of the ssh.service slice: `ps aux | grep 'sshd: ' | wc -l` -> 10945 but `systemctl status sshd | grep 'sshd: ' | wc -l` -> 3424. I know I can change the TasksMax directive, but it seems like other than making sure the slice doesn't run out of pids to give, it also doesn't help putting limits in place, the counting is off. — Simon, May 27 '19 at 14:43

score 0 · Answer 2 · answered Jun 11 '21 at 11:25

Stumbled on this while looking for something else. I don't know for sure but since no-one else has answered, I'll give my unconfirmed theory:

sshd starts its priv and user processes directly (not via systemd). Since they are not children of any systemd process it cannot reliably detect them coming and going**, rather it can only guess. Its guesses seem to be based on the pam session hook**, and I believe sshd does call pam_session_open and pam_session_close****.

If it is only guessing, it will guess wrong sometimes like anyone would.

Assuming my theory was correct then the answers to your questions would be:

How does systemd decide to put a process in one slice or the other?

The process that logs the user in, e.g. sshd, explicitly calls pam_session_open, which ends up running through a sequence of "session" entries in /etc/pam.d files. (At least on my current debian, sshd passes "sshd" as service name to pam, so pam starts at /etc/pam.d/sshd "session" entries.) One of those session entries is e.g. "-session optional pam_systemd.so", which triggers a call to pam_systemd.so's pam_sm_session_open, which talks to systemd itself essentially saying "hey, user xxx has logged in". I think systemd then looks for related processes and puts claims they are part of user xxx slice. *****

Do they get moved?

They only get moved in systemd's mind, not in the linux process hierarchy, is my answer :-)

Is there a way to accurately and reliably put all these sessions under the appropriate user.slice so I can manage the limitations set for the number of processes allowed?

My answer is no, not without modifying sshd, but...

In your particular case, do you really need systemd for these processes? If they just forward to somewhere else then perhaps not? In that case you could adjust the /etc/pam.d files so that sshd does not trigger pam_systemd at all. Then you could reliably use the linux user limits if you want to limit those processes?

* I base this on ps -ef | grep sshd and following the child back up through its parents

** that's my possibly outdated understanding of unix processes, I'd be happy for someone to update my understanding if it's no longer accurate

*** that is based at least on Answer 1 mention of "pam_systemd"

**** I think I know this but can't really recall where from so won't claim it is not 100%

***** "sshd" as service name is hard-coded in sshd source I believe but distros and versions of them adjust source code; pam file handling is detailed in "man pam" and its "see also"s though there are probably more readable descriptions around

How does systemd put sshd processes in slices?

2 Answers2

Linked