2

I'm recently started using monit to monitor the status of sshd on my CentOS 5.4 server. This works fine, but every so often monit reports that sshd is no longer running. This isn't true - I am still able to login to the server via ssh, however I note the following:

  • There is no longer any PID file at /var/run/sshd.pid - after a reboot this file exists. Once it is gone, restarting sshd via service sshd restart does not create the PID file.
  • sudo service sshd status reports openssh-daemon is stopped - again, restarting sshd does not change this, but a reboot does.
  • sudo service sshd stop reports failed, presumably because of the missing PID file.

Any idea what is going on?

Update

sudo netstat -lptun gives the following output relating to port 22

tcp        0      0 :::22      :::*    LISTEN      20735/sshd

Killing the process with this PID as suggested by @Henry and then starting sshd via service results in service sshd status recognising the process by PID again. Would still like to understand this better.

RPM verify suggested by a couple of answerers shows this:

sudo rpm -vV openssh openssh-server openssh-clients | grep 'S\.5'
S.5....T  c /etc/pam.d/sshd
S.5....T  c /etc/ssh/sshd_config

/etc/pam.d/sshd has the following contents:

#%PAM-1.0
auth       include      system-auth
account    required     pam_nologin.so
account    include      system-auth
password   include      system-auth
session    optional     pam_keyinit.so force revoke
session    include      system-auth
#session    required     pam_loginuid.so

Should that last line be commented out?

Update Here's the output of @YannickGirouard 's script:

$ sudo ./sshd_test
Searching for the process listening on port 22...

Found the following PID: 21330

Command line for PID 21330: /usr/sbin/sshd

Listing process(es) relating to PID 21330:

UID        PID  PPID  C STIME TTY          TIME CMD
root     21330     1  0 14:04 ?        00:00:00 /usr/sbin/sshd

Listing RPM information about openssh packages:

Name        : openssh                      Relocations: (not relocatable)
Version     : 4.3p2                             Vendor: CentOS
Release     : 72.el5_7.5                    Build Date: Tue 30 Aug 2011 12:34:14 AM BST
Install Date: Sun 06 Nov 2011 12:50:57 AM GMT      Build Host: builder10.centos.org
Group       : Applications/Internet         Source RPM: openssh-4.3p2-72.el5_7.5.src.rpm
Size        : 745390                           License: BSD
Signature   : DSA/SHA1, Fri 02 Sep 2011 01:13:01 AM BST, Key ID a8a447dce8562897
URL         : http://www.openssh.com/portable.html
Summary     : The OpenSSH implementation of SSH protocol versions 1 and 2

------------------------------------------------------

Name        : openssh-clients              Relocations: (not relocatable)
Version     : 4.3p2                             Vendor: CentOS
Release     : 72.el5_7.5                    Build Date: Tue 30 Aug 2011 12:34:14 AM BST
Install Date: Sun 06 Nov 2011 12:51:04 AM GMT      Build Host: builder10.centos.org
Group       : Applications/Internet         Source RPM: openssh-4.3p2-72.el5_7.5.src.rpm
Size        : 871132                           License: BSD
Signature   : DSA/SHA1, Fri 02 Sep 2011 01:13:01 AM BST, Key ID a8a447dce8562897
URL         : http://www.openssh.com/portable.html
Summary     : The OpenSSH client applications

------------------------------------------------------

Name        : openssh-server               Relocations: (not relocatable)
Version     : 4.3p2                             Vendor: CentOS
Release     : 72.el5_7.5                    Build Date: Tue 30 Aug 2011 12:34:14 AM BST
Install Date: Sun 06 Nov 2011 12:51:04 AM GMT      Build Host: builder10.centos.org
Group       : System Environment/Daemons    Source RPM: openssh-4.3p2-72.el5_7.5.src.rpm
Size        : 492478                           License: BSD
Signature   : DSA/SHA1, Fri 02 Sep 2011 01:13:01 AM BST, Key ID a8a447dce8562897
URL         : http://www.openssh.com/portable.html
Summary     : The OpenSSH server daemon

------------------------------------------------------

However, I've since got things working by killing the process and starting afresh, as suggested by @Henry below, so perhaps I am no longer seeing the same thing. Will try again if I am seeing the issue again after next reboot.

Update - 14 March Monit alerted me that sshd had disappeared, and again I am able to ssh onto the server. So now I can run the script

$ sudo ./sshd_test
Searching for the process listening on port 22...

Found the following PID: 2208

Command line for PID 2208: /usr/sbin/sshd

Listing process(es) relating to PID 2208:

UID        PID  PPID  C STIME TTY          TIME CMD
root      2208     1  0 Mar13 ?        00:00:00 /usr/sbin/sshd
root      1885  2208  0 21:50 ?        00:00:00 sshd: dunx [priv]

Listing RPM information about openssh packages:

Name        : openssh                      Relocations: (not relocatable)
Version     : 4.3p2                             Vendor: CentOS
Release     : 72.el5_7.5                    Build Date: Tue 30 Aug 2011 12:34:14 AM BST
Install Date: Sun 06 Nov 2011 12:50:57 AM GMT      Build Host: builder10.centos.org
Group       : Applications/Internet         Source RPM: openssh-4.3p2-72.el5_7.5.src.rpm
Size        : 745390                           License: BSD
Signature   : DSA/SHA1, Fri 02 Sep 2011 01:13:01 AM BST, Key ID a8a447dce8562897
URL         : http://www.openssh.com/portable.html
Summary     : The OpenSSH implementation of SSH protocol versions 1 and 2

------------------------------------------------------

Name        : openssh-clients              Relocations: (not relocatable)
Version     : 4.3p2                             Vendor: CentOS
Release     : 72.el5_7.5                    Build Date: Tue 30 Aug 2011 12:34:14 AM BST
Install Date: Sun 06 Nov 2011 12:51:04 AM GMT      Build Host: builder10.centos.org
Group       : Applications/Internet         Source RPM: openssh-4.3p2-72.el5_7.5.src.rpm
Size        : 871132                           License: BSD
Signature   : DSA/SHA1, Fri 02 Sep 2011 01:13:01 AM BST, Key ID a8a447dce8562897
URL         : http://www.openssh.com/portable.html
Summary     : The OpenSSH client applications

------------------------------------------------------

Name        : openssh-server               Relocations: (not relocatable)
Version     : 4.3p2                             Vendor: CentOS
Release     : 72.el5_7.5                    Build Date: Tue 30 Aug 2011 12:34:14 AM BST
Install Date: Sun 06 Nov 2011 12:51:04 AM GMT      Build Host: builder10.centos.org
Group       : System Environment/Daemons    Source RPM: openssh-4.3p2-72.el5_7.5.src.rpm
Size        : 492478                           License: BSD
Signature   : DSA/SHA1, Fri 02 Sep 2011 01:13:01 AM BST, Key ID a8a447dce8562897
URL         : http://www.openssh.com/portable.html
Summary     : The OpenSSH server daemon

------------------------------------------------------

Again, when I look for /var/run/sshd.pid I don't find it.

$ cat /var/run/sshd.pid
cat: /var/run/sshd.pid: No such file or directory
$ sudo netstat -anp | grep sshd
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      2208/sshd
$ sudo kill 2208
$ sudo service sshd start
Starting sshd:                                             [  OK  ]
$ cat /var/run/sshd.pid
3794
$ sudo service sshd status
openssh-daemon (pid  3794) is running...

Is it possible that sshd is restarting and not creating a pidfile for some reason?

dunxd
  • 9,482
  • 21
  • 80
  • 117
  • After restarting, look in `/var/log/secure` and see what it tells you. – qweet Feb 27 '12 at 10:26
  • `error: Bind to port 22 on 0.0.0.0 failed: Address already in use.` and `fatal: Cannot bind any address.` Not surprising since sshd is clearly running. – dunxd Feb 27 '12 at 10:46
  • Can you post your monit config file for the ssh service? – cjc Feb 27 '12 at 21:12
  • @dunxd Restarting a service normally implies killing it with a SIGHUP signal and then starting it again. Therefore, if it says the port is already in use, it's not because it's "clearly running", but rather because it was not started to begin with and something else is holding he port opened. The fact that See my answer below for possible explanation with steps to help your troubleshooting process. – Yanick Girouard Feb 27 '12 at 22:49
  • @cjc - I'm less concerned about Monit here than I am about the general behaviour. – dunxd Mar 05 '12 at 15:45
  • Is sshd configured to write the PID file? There should be a `PidFile /var/run/sshd.pid` in your `sshd_config` file. – Chris S Mar 07 '12 at 14:19
  • @ChrisS - yes. As stated above, the PID file exists after a reboot. After killing the sshd process that was running, and then starting it with `service sshd start` the PID file exists again. Haven't seen it disappear again yet. – dunxd Mar 13 '12 at 08:33
  • What was the resolution here? Was the consistency of the ssh server binary ever checked? – ewwhite Apr 14 '12 at 16:10
  • Yes - details of the checks are above. As far as I can tell, there is no compromise to the ssh server binaries. I'm still getting the issue of no pid file coming up every so often. I can't kill sshd using `service sshd stop`, but I can find the pid and kill the process then `service sshd start` works and the pid file is created. So either the process is stopping, then starting again without creating a pid file, or something is causing the pid file to get deleted. – dunxd Apr 14 '12 at 22:16
  • Can you update your OS? CentOS 5.4 is quite old as well. – ewwhite Apr 16 '12 at 08:57
  • This is a production machine - I can't take this down to update the OS. I don't have budget for a new machine to move the production to. I'd have to be seriously convinced that this was more than cosmetic before I'd consider that. – dunxd Apr 19 '12 at 15:17

3 Answers3

3

I have the same problem. I fixed it, temporarily at least, by killing the sshd process and then starting it.

    service sshd status
    openssh-daemon is stopped

(even though I am logged in via ssh)

    rpm -vV openssh openssh-server openssh-clients | grep 'S\.5'
    S.5....T  c /etc/ssh/sshd_config

    netstat -anp | grep sshd
    tcp    0      0 0.0.0.0:22           0.0.0.0:*          LISTEN      17501/sshd

    kill 17501
    service sshd start

    service sshd status
    openssh-daemon (pid  3157) is running...

And now monit is happy, too. :)

Henry
  • 31
  • 1
  • thank you! For readers: be careful to only kill the sshd process that is `LISTEN`-ing, and leave alone the one that is `ESTABLISHED` to your currently logged ssh session. – matt wilkie Feb 03 '17 at 00:10
2

From what you're describing, it almost looks like another process is taking over port 22 and answers your SSH requests instead. Getting a message saying the port is already in use when restarting a service is not normal. Looks like the actual sshd service is killed in favor of that other "phantom" process. Could be that you have installed opensshd twice without changing the port it's using, or (and don't panic here, it's just a possibility) your server has been hacked and the hacker replaced sshd with another daemon of his own.

To see which process is using your port, try this:

netstat -lptun

Then look for any line showing a local address ending with :22, and look at the last column (PID/Program name). Note down any PID using port 22.

Then to find out the full command launched for that PID you do this:

cat /proc/PID/cmdline (where PID = the PID of the process)

If it's not /usr/sbin/sshd, (or whatever opensshd binary it should be) you've got a problem!

Here's a script you can run safely to dump some useful information:

#! /bin/bash

echo -e "Searching for the process listening on port 22...\n"
PORT22_PID=$(netstat -lptun | grep -E ":22\s" | awk '{print $7}' | awk -F/ '{print $1}' | uniq)
if [ ! -n "$PORT22_PID" ]; then
        echo "Error: Was not able to find any process listening on port 22"
        exit 1
fi
echo -e "Found the following PID: $PORT22_PID\n"
echo -e "Command line for PID $PORT22_PID: $(cat /proc/$PORT22_PID/cmdline)\n"
echo -e "Listing process(es) relating to PID $PORT22_PID:\n"
echo "UID        PID  PPID  C STIME TTY          TIME CMD"
ps -ef | grep -E "\s$PORT22_PID\s"
echo
echo -e "Listing RPM information about openssh packages:\n"
RPMS=$(rpm -qa | grep openssh)
for r in $RPMS; do
        rpm -qi $r | sed -n '/^Name/,/^Summary/p'
        echo -e "\n------------------------------------------------------\n"
done

Just paste the output in your original question and it should help. I've tested this script thoroughly on my own Centos server.

Yanick Girouard
  • 2,295
  • 1
  • 17
  • 18
  • See update in question for what is shown by `netstat -lptun`. The `grep` in your script doesn't pick up anything for port 22 for me, but I can see from netstat that it is there. – dunxd Mar 05 '12 at 16:00
  • That's odd, it should work fine based on the output of your netstat command. What is the output of `netstat -lptun | grep -E ":22\s"` ? – Yanick Girouard Mar 06 '12 at 19:12
  • I just noticed, the only line that you pasted shows the ipv6 bind, but not the ipv4 one, which confirms the error you mentioned about problems binding to port 22 on 0.0.0.0. – Yanick Girouard Mar 07 '12 at 02:09
  • Your grep expression is not returning any results. `netstat -lptun | grep ":22"` outputs the lines relating to port 22. It's not listening on any other ports that begin with 22. – dunxd Mar 07 '12 at 14:02
  • Ok - with sshd_config using default AddressFamily (any) I only get the output showing bind to IPv6 addresses. With AddressFamily set to `inet` I see `0.0.0.0:22` in the netstat output. Anyway, that isn't really the problem I'm trying to resolve. I've edited your script so it captures the info you want, but I don't think it indicates any compromise of sshd. – dunxd Mar 07 '12 at 14:06
0

First, can you post your monit.conf or Monit config file? It makes sense to see if you're hitting the write PID file and process parameters. My Monit stanza for SSH monitoring on CentOS 5.x is:

    check process ssh
        with pidfile "/var/run/sshd.pid"
        start program = "/sbin/service sshd start"
        stop program = "/sbin/service sshd stop"

Before you get too deep, I'd double-check the health of the SSH daemon.

Run an rpm -vV openssh openssh-server openssh-clients | grep 'S\.5'

This checks the consistency of the SSH binaries and verifies them against what was installed in the original RPM.

[root@freaky ~]# rpm -vV openssh openssh-server openssh-clients | grep 'S\.5'
S.5....T  c /etc/ssh/sshd_config
S.5....T  c /etc/ssh/ssh_config

In the example above, the only modified files are SSH configuration files. If one of the executables like /usr/sbin/sshd or /usr/bin/ssh appears in your output, your system has been compromised. You have the option of redownloading the openssh openssh-server and openssh-clients packages and force installing them to overwrite potentially bad binaries... But that's a larger topic.

Also check your netstat information.

[root@freaky ~]# netstat -anp | grep sshd
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 :::22                       :::*                        LISTEN      4278/sshd      

This will provide the PID and port information of the currently-running sshd.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 1
    @ewwithe He already pointed out that the sshd.pid file was missing, and that the restart command of the sshd service failed with "Address already in use" error when trying to bind to port 22 AND that he could still SSH to the server, meaning there IS a daemon listening and that the sshd service is installed. That clearly means that it's not the standard openssh-server package that is listening (if it's openssh at all). Therefore, I'm not sure any of the steps you provided will truly help, unless I missed something crucial!? – Yanick Girouard Feb 27 '12 at 23:39
  • What's the harm in an RPM verify? That can be helpful to understand the extent of the damage. – ewwhite Feb 27 '12 at 23:40
  • Perhaps it will give more info, but I'm not confident it will really be useful based on what we already know. – Yanick Girouard Feb 27 '12 at 23:52
  • I've added a script to my answer which should gather some useful info, including info on the openssh rpm's. Feel free to try it on your own server to see the output. – Yanick Girouard Feb 28 '12 at 00:23
  • If nothing else, the information @ewwhite provided can help others looking for hints on what to do in a similar situation. I would think it's good sense to verify the binaries. – Bart Silverstrim Mar 07 '12 at 14:50
  • Binaries verified fine. – dunxd Apr 18 '12 at 12:28