2

I have a problem with a debian system. Today at exactly 04:00 it stopped responding to nagios. I cant login via SSH either. That is not the first time this is happening, but the first time I watched it with nagios.

There is one special, uncommon thing about this box: It boots from an usb stick. That is why I thought it was a smart idea to map /var /tmp to a tmpfs.

I am fairly certain, that if I reboot it, it will work again. But because /var/log is mapped to memory, I cant read the logs after the reboot.

The next problem is that the hardware is offsite, so I can't login locally.

At first the problem might be that the tmpfs is filling up. But nagios did not warn before it could not connect anymore. I have set the warning threshold to 90% free space. So this does not seem to be it.

Other maybe interesting symptoms:

  • the openVPN Server is still working
  • routing still works
  • the SSH port is still open and I am asked for the username. But if i supply the password the connection is dropped
    • port 80 is open, but apache does not respond

The question that interests me most is: What could a debian system could do 04:00 in the morning? Some kind of update check?

I am grateful for any ideas or pointers in the right direction. Is there anything worth monitoring with nagios to get a hint? Next time I will add monitoring swap usage.

dummy
  • 231
  • 2
  • 6
  • It sounds to me like it ran out of memory. Several things inexplicably don't work, but others are fine...stinks of the `oom` killer. – bahamat Oct 18 '12 at 08:30
  • And if /var/tmp is on tmpfs, something could be eating RAM just by writing data there – itsbruce Oct 18 '12 at 08:32
  • I dont think that /var or /tmp are filled up with files, because i checked the free space (filesystem wise) via nagios. I think I should have got an warning before complete failure. But if anything else is eating up RAM quickly, /var wouldn't have any space also. – dummy Oct 18 '12 at 08:37
  • If /var/tmp is mounted on tmpfs, then its disk usage will not count towards disk usage on /var and therefore will not show up in the nagios disk check for /var. Do you have a disk check for /var/tmp? – itsbruce Oct 18 '12 at 09:07
  • Good idea I will look into it. – dummy Oct 18 '12 at 09:45
  • Send your syslog to another server. – Michael Hampton Oct 18 '12 at 11:44

1 Answers1

1

We have had this where the server had run out of memory. All running processes would continue but allocating new processes could fail.

If you are monitoring memory usage, this could answer the question. You could also try logging remotely.

However, to answer your question, check your /etc/cron.daily - These scripts may run at 4am. You can also check /etc/cron.d/ and /etc/crontab to see if there scheduled tasks.

drone.ah
  • 482
  • 2
  • 6