21

I'm running a Ubuntu 16.04 container under Proxmox 5.2-11. After applying the latest round of patches1 I'm unable to login at the console or over ssh.

I mounted the container root FS on the hypervisor and added pts/0 to /etc/security/access.conf (we run pam_access) and that allowed root login to the console. We have root : lxc/tty0 lxc/tty1 lxc/tty2 in access.conf which I thought was sufficient so why I needed pts/0 now is puzzling.

I noticed ssh was not running so tried starting it by hand (/usr/sbin/sshd -DDD -f /etc/ssh/sshd_config) and received this error:

Missing privilege separation directory: /var/run/sshd

I created the directory by hand, started ssh and was able to finally login, but after a reboot, the problem persists. The directory is not being created. Only useful bits in journalctl and the only interesting part is something about "operation not permitted" but no further info.

I'm not too familiar with 16.04 so wondering how I can find out more about the problem. I have no /var/log/syslog or /var/log/messages only kern.log so kind of lost.

1

systemd-sysv 229-4ubuntu21.9
libpam-systemd 229-4ubuntu21.9
libsystemd0 229-4ubuntu21.9
systemd 229-4ubuntu21.9
udev 229-4ubuntu21.9
libudev1 229-4ubuntu21.9
iproute2 4.3.0-1ubuntu3.16.04.4
libsasl2-modules-db 2.1.26.dfsg1-14ubuntu0.1
libsasl2-2 2.1.26.dfsg1-14ubuntu0.1
ldap-utils 2.4.42dfsg-2ubuntu3.4
libldap-2.4-2 2.4.42dfsg-2ubuntu3.4
libsasl2-modules 2.1.26.dfsg1-14ubuntu0.1
libgs9-common 9.25dfsg1-0ubuntu0.16.04.3
ghostscript 9.25dfsg1-0ubuntu0.16.04.3
libgs9 9.25dfsg1-0ubuntu0.16.04.3

[2]

Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[474]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:48 host16 mysqld_safe[495]: Starting mysqld daemon with databases from /var/lib/mysql/mysql
Nov 27 10:13:48 host16 mysqld[500]: 181127 10:13:48 [Note] /usr/sbin/mysqld (mysqld 10.0.36-MariaDB-0ubuntu0.16.04.1) starting as process 499 ...
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:48 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: Failed to reset devices.list on /system.slice/ssh.service: Operation not permitted
Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[502]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:48 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: Failed to reset devices.list on /system.slice/ssh.service: Operation not permitted
Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[503]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:48 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: Failed to reset devices.list on /system.slice/ssh.service: Operation not permitted
Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[504]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:49 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Start request repeated too quickly.
Nov 27 10:13:49 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Failed with result 'start-limit-hit'.
Nov 27 10:13:49 host16 systemd[1]: Started /etc/rc.local Compatibility.
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/plymouth-quit.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Starting Terminate Plymouth Boot Screen...
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/plymouth-quit-wait.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Starting Hold until boot process finishes up...
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/rc-local.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Started Hold until boot process finishes up.
Nov 27 10:13:49 host16 systemd[1]: Started Container Getty on /dev/pts/1.
Nov 27 10:13:49 host16 systemd[1]: Started Container Getty on /dev/pts/0.
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/console-getty.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Started Console Getty.
Nov 27 10:13:49 host16 systemd[1]: Reached target Login Prompts.
Nov 27 10:13:49 host16 systemd[1]: Started Terminate Plymouth Boot Screen.
Nov 27 10:13:52 host16 nslcd[338]: accepting connections
Nov 27 10:13:52 host16 nslcd[275]:    ...done.
Nov 27 10:13:52 host16 systemd[1]: Started LSB: LDAP connection daemon.
Nov 27 10:13:52 host16 systemd[1]: Failed to reset devices.list on /system.slice/cron.service: Operation not permitted
Nov 27 10:13:52 host16 systemd[1]: Started Regular background program processing daemon.
Nov 27 10:13:52 host16 systemd[1]: Failed to reset devices.list on /system.slice/atd.service: Operation not permitted

Added systemd-tmpfiles --create output

Really bizarre.... I checked /tmp and those files don't exist enter image description here

Server Fault
  • 3,454
  • 7
  • 48
  • 88

6 Answers6

17

One mistake you did was trying to start sshd by hand.

If you instead start sshd through official means it should just work. The service command knows what the correct way to start a service on your distribution is, and this should work:

service ssh start

In case of sysv init scripts, that's everything you need to do. The reason the directory is missing is that /var/run is a symlink to /run and /run is a tmpfs mount point. That means on each boot /var/run will start out empty. When you use the service command the /etc/init.d/ssh script will be used to start sshd but before doing that the script will create /var/run/sshd if it doesn't exist.

With systemd things work a bit differently. There will be a file called /usr/lib/tmpfiles.d/sshd.conf with this content:

d /var/run/sshd 0755 root root

During boot this should cause the /var/run/sshd directory to be created. What you need to verify that the file exists and has the correct contents. If the /var/run/sshd directory is still missing you can verify if it gets created when you run systemd-tmpfiles --create manually.

kasperd
  • 29,894
  • 16
  • 72
  • 122
  • That's a good idea but is essentially doing the same thing the system tried to do on boot (and failed the same way). What I'm really wondering about is why the privsep directory is not being created by normal means. Is there a disk error? A permission issue? lock file? Anywhere else to look besides `journalctl`? – Server Fault Nov 27 '18 at 19:18
  • @ServerFault Under certain circumstances `/etc/init.d/ssh` will not be run and `systemctl` will be used instead. And when `sshd` is started through `systemctl` the directory is not created. That leaves a few open questions which I'll try to dig into tomorrow such as what exactly has changed and how exactly is that directory supposed to be created when `systemctl` is used. – kasperd Nov 27 '18 at 22:17
  • @ServerFault When using `systemctl` it's `/etc/init/ssh.conf` which is responsible for creating the directory. I tested on a fully up to date Ubuntu 16.04 and the directory does get created during boot. But for some reason it does not get created when using `service ssh start`. There are some recent updates of some `systemd` related packages, but I don't see any evidence of behavior regarding creation of that directory having changed. And when I test it does get created during boot. So the question then is if your `/etc/init/ssh.conf` has the correct contents. – kasperd Nov 27 '18 at 22:40
  • @ServerFault I may have been mistaken about `/etc/init/ssh.conf` there is also `/usr/lib/tmpfiles.d/sshd.conf` which appears to be used by `systemd-tmpfiles --create`. Does `systemd-tmpfiles --create` create the missing `/var/run/sshd` directory? – kasperd Nov 27 '18 at 23:15
  • Added pic to question from `systemd-tmpfiles --create` output. The "symlinks" systemd is complaining about (/tmp/.X11-unix) don't even exist in `/tmp/` so I have no idea where it's getting that from. Thanks for all your help on it, but I think I'm going to move on. – Server Fault Nov 28 '18 at 14:17
  • @ServerFault Copying the text to the question would be more useful than a screenshot. Even if the listed entries don't exist it could be an issue with one of the parent directories. Is any of `/run`, `/tmp`, or `/var` a symlink? – kasperd Nov 28 '18 at 14:54
  • I know, but I couldn't get into a "copyable" console. All I had was the VNC console on from Proxmox, hence the screenshot. – Server Fault Dec 19 '18 at 16:15
  • systemd sucks. Use alpine. – xoid May 01 '19 at 16:21
13

So /run (and /var/run symlinked to it) gets recreated every reboot. Except that systemd-tmpfiles isn't doing that for some files including (/var)/run/sshd.

Apparently, this is fixed by a OpenVZ kernel upgrade. But to actually fix it now you edit /usr/lib/tmpfiles.d/sshd.conf and remove /var from the line d /var/run/sshd 0755 root root to read instead: d /run/sshd 0755 root root

And that's it..!

And when openssh-server gets upgraded, we hope that they will have fixed this bug (or is it really a bug in systemd? or openvz??) -- otherwise you could run into the same problem.

pepa65
  • 186
  • 4
  • 1
    +1 for the fix while awaiting a Kernel upgrade. In my case it needed to become: "d /run/sshd 0755 root root" – paulzag Jan 19 '19 at 01:59
  • 1
    @paulzag That worked for me as well. I wonder if @pepa65 meant to say `d /run/sshd 0755 root root`, since their directions say to only remove the `/var` portion (even though the code they give in the answer has both `/var` and `/run` removed). – Stephen Schrauger Feb 07 '19 at 15:30
4

Apparently this gets resolved when running an OpenVZ kernel 2.6.32-042stab134.7 or newer. I find it strange that there is no fix possible in the systemd start scripts somehow. Probably an ugly hack like automatically creating /run/sshd/ after starting up and then starting sshd would work.

The output of my systemd-tmpfiles --create:

[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
fchownat() of /run/named failed: Invalid argument
Failed to openat(/dev/simfs): Operation not permitted
Failed to validate path /var/run/screen: Too many levels of symbolic links
Failed to validate path /var/run/sshd: Too many levels of symbolic links
Failed to validate path /var/run/sudo: Too many levels of symbolic links
Failed to validate path /var/run/sudo/ts: Too many levels of symbolic links
fchownat() of /run/systemd/netif failed: Invalid argument
fchownat() of /run/systemd/netif/links failed: Invalid argument
fchownat() of /run/systemd/netif/leases failed: Invalid argument
fchownat() of /run/log/journal failed: Invalid argument
fchownat() of /run/log/journal/e9e1d08bc42c48999865b96c250f40cc failed: Invalid argument
fchownat() of /run/log/journal/e9e1d08bc42c48999865b96c250f40cc/system.journal failed: Invalid argument

The changelog of OpenVZ 2.6.32-042stab134.7 says this:

Running Ubuntu containers with systemd 229-4ubuntu21.9 could result in services failing to start because systemd-tmpfiles was unable to validate path due to symlinking issues. (PSBM-90038)

kasperd
  • 29,894
  • 16
  • 72
  • 122
pepa65
  • 186
  • 4
2

For as much trouble as I've had with systemd over the years, I must admit this issue stems instead from the Ansible synchronize directive.

For some reason, after provisioning this host with our ansbile scripts, it left the / directory (as well as /etc, /opt and others) owned by an admin user, and not root. After running chown to correct things, /var/run/sshd is now created on boot again.

I really appreciate all the input but there is no bug here, at least in the sense that applying inappropriate ownership to root directories caused undefined system behavior.

Server Fault
  • 3,454
  • 7
  • 48
  • 88
0

I also had this behavior. The problem in my case was that ssh.socket got enabled somehow. When disabling ssh.socket, ssh.service does start normally on boot.

NSINE
  • 1
0

One way I've seen around this is to simply create that directory, yourself, in your Dockerfile.

FROM debian:buster
MAINTAINER Adam Z Winter

# Steps done in one RUN layer:
# - Install packages
RUN apt-get update && \
    apt-get -y install openssh-server && \
    mkdir -p /var/run/sshd

COPY files/entrypoint /

EXPOSE 22

ENTRYPOINT ["/entrypoint"]
Adam Winter
  • 119
  • 6