Am I attacked or just stupid?


I run a server using Debian Squeeze with several OpenVZ containers. The containers run mostly Squeeze, some Lenny, and some already updated to Wheezy. The host doesn't do that much beyond iptables and DHCP. File servers, proxies, mail servers, kerberos, LDAP, ... are all put into containers. The system ran stable for many years and had no major changes except some firewall rules for over a year.

2 days ago all of a sudden the system crashed. I had a lot of problems bringing it up again. At first it wouldn't let me log in via ssh. root login was denied by 'You do not exists. Go away!' Local login was fine. Some time later ssh worked again. By coincidence I didn't re-use the line from the bash history, but typed a new command, which triply checked was identical to the line, which didn't work before but worked before the crash.

Then the system ran, but network traffic on most protocols was blocked following SYN ACK. DNS, Telnet, and SSH were fine, but the rest was a mess. After a couple of hours fishing in the dark and reloading the firewall several times all of a sudden everything went fine again. I couldn't find anything suspicious in the logs - but I'm not a forensic expert.

Today the nscd of the file server went out of sockets to contact the LDAP due to the container quota. Something that never happened before. I also saw a lot (> 30) of sockets claimed by smbd.

/var/log/messages looked quite the same as syslog. /var/log/kern.log had this additional information on crash reasons:

/var/log/kern.log:2950:Sep 19 10:46:57 asgard kernel: [6529441.320086] INFO: task sendmail:32181 blocked for more than 120 seconds.
/var/log/kern.log:2982:Sep 19 10:48:57 asgard kernel: [6529561.324525] INFO: task kdmflush:1932 blocked for more than 120 seconds.
/var/log/kern.log:3005:Sep 19 10:48:57 asgard kernel: [6529561.324694] INFO: task xfssyncd:10162 blocked for more than 120 seconds.
/var/log/kern.log:3027:Sep 19 10:48:57 asgard kernel: [6529561.324934] INFO: task postgres:16827 blocked for more than 120 seconds.
/var/log/kern.log:3060:Sep 19 10:49:51 asgard kernel: [6529561.325129] INFO: task imapd:31749 blocked for more than 120 seconds.
/var/log/kern.log:3084:Sep 19 10:49:51 asgard kernel: [6529561.325248] INFO: task cleanup:32194 blocked for more than 120 seconds.
/var/log/kern.log:3106:Sep 19 10:50:57 asgard kernel: [6529681.324028] INFO: task flush-253:3:3216 blocked for more than 120 seconds.
/var/log/kern.log:3142:Sep 19 10:50:57 asgard kernel: [6529681.324224] INFO: task kjournald:6859 blocked for more than 120 seconds.
/var/log/kern.log:3166:Sep 19 10:50:57 asgard kernel: [6529681.324366] INFO: task syslogd:11720 blocked for more than 120 seconds.
/var/log/kern.log:3198:Sep 19 10:50:57 asgard kernel: [6529681.324574] INFO: task postgres:16827 blocked for more than 120 seconds.
/var/log/kern.log:7152:Sep 19 19:29:41 asgard kernel: [ 1440.617090] INFO: task sendmail:11892 blocked for more than 120 seconds.

The final 'sendmail' crash was after rebooting the machine. Since then no more such events occurred. 'imapd' and 'postgres' definitely run in different containers.

Well, I do not see any smoking gun, but I'm probably just blind. Setting up the system from known / presumed good backups would hit me too hard to try it without very good reasons.

I'd appreciate any advice what to check next.

Thanks for your help.

Update: Putting more effort in searching for some pre-cursor of the crash I found the following in syslog:

Sep 19 10:09:56 asgard ntop[7965]:   **WARNING** packet truncated (8754->8232)
Sep 19 10:09:56 asgard ntop[7965]:   **WARNING** packet truncated (8754->8232)
Sep 19 10:09:56 asgard ntop[7965]:   **WARNING** packet truncated (10490->8232)
Sep 19 10:09:56 asgard ntop[7965]:   **WARNING** packet truncated (8754->8232)
Sep 19 10:09:56 asgard ntop[7965]:   **WARNING** packet truncated (8754->8232)
Sep 19 10:09:56 asgard ntop[7965]:   **WARNING** packet truncated (17442->8232)
Sep 19 10:11:02 asgard ntop[7965]:   **WARNING** packet truncated (11650->8232)
Sep 19 10:11:02 asgard ntop[7965]:   **WARNING** packet truncated (10202->8232)
Sep 19 10:11:29 asgard ntop[7965]:   **WARNING** packet truncated (8754->8232)
Sep 19 10:13:27 asgard ntop[7965]:   **WARNING** packet truncated (8754->8232)
Sep 19 10:20:33 asgard ntop[7965]:   **WARNING** packet truncated (8754->8232)

I know this is deemed uncritical, but it seems to be a rare event. Packet truncation only exists on the day of the second crash. Nowhere else in all available log files.

Lars Hanke

Posted 2013-09-21T21:17:16.237

Reputation: 211



This looks like DoS, most likely originating from nework or from inside of one of compromised container.

I'd look into vmstat, run it continually every 5 seconds: vmstat 5 and take a note where resources are spent. You can also use screen and run vmstat 60 (every minute) in a separate window - this way you can notice spikes when they happen over longer period of time.

In this situation high/spiking System CPU(sy), high context switch rate (cs) and high IO (it shows both network and disk) will indicate DoS:

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0   9584   6820 132432  23256    1    1   136    12    1    1 83  1 15  0  0
 0  0   9584   6696 132432  23256    0    0     0     0   20   32  0  0 99  0  1

At the same time monitor the network traffic on host, i recommend ntop, ie:

$ nload -t 10000 -u H eth0

Alec Istomin

Posted 2013-09-21T21:17:16.237

Reputation: 499


It looks like an Disk I/O error. Run fsck and check for errors.

Shain Padmajan

Posted 2013-09-21T21:17:16.237

Reputation: 131

I'll try to schedule downtime for that. However, there are no I/O disk failure related logs anywhere. Or did you see any? – Lars Hanke – 2013-09-23T19:04:24.550


Maybe you don't have any file system errors, but I'm sure you see that warnings in your log, because you have many processes in D state (waiting for I/O) and the kernel is informing you of the long wait.


Posted 2013-09-21T21:17:16.237

Reputation: 281

I guess that this has been the case. But why? Under normal conditions there are no processes in D state. If actually the network went down, it might explain that. But why would it go down only for some services? Why did that condition survive reboot? And why did it come up again? – Lars Hanke – 2013-09-23T19:00:08.220


The error indicates that your processes are waiting too long (120sec) to access disks; this happens on highly crowded servers where disks are too busy to respond to processes.

You can make sure by checking "Waiting" under vmstat.


Posted 2013-09-21T21:17:16.237

Reputation: 56