1

Possible Duplicate:
Anyone else experiencing high rates of Linux server crashes during a leap second day?

We have two servers that are grinding to a halt. One is a VM and the other is bare metal. Neither of them are running similar code but they are on the same network. It appears that an incredible number of context switches are arising from ksoftirqd (which is taking up a lot of CPU).

vmstat output

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 605092 182496 2637556    0    0     0     0 4177 519187  8 19 73  0  0
 2  0      0 605092 182496 2637556    0    0     0     0 4792 520980  8 19 74  0  0
 3  0      0 605092 182496 2637552    0    0     0     0 2137 659640 18 26 56  0  0
 ...

pidstat output

TCK4-BM-06A:~ # pidstat -w -I 5
Linux 2.6.32.12-0.7-default (TCK4-BM-06A)   07/02/2012  _x86_64_

03:03:01 PM       PID   cswch/s nvcswch/s  Command
03:03:06 PM         1      0.20      0.00  init
03:03:06 PM         4 386666.27      0.00  ksoftirqd/0
03:03:06 PM         6      0.60      0.00  ksoftirqd/1
03:03:06 PM         8 378213.17      0.00  ksoftirqd/2
03:03:06 PM        10      0.20      0.00  ksoftirqd/3
03:03:06 PM        12      0.20      0.00  ksoftirqd/4
03:03:06 PM        26 377115.37      0.00  ksoftirqd/11
03:03:06 PM        27      1.80      0.00  events/0
03:03:06 PM        28      1.00      0.00  events/1
03:03:06 PM        29      1.00      0.00  events/2
03:03:06 PM        30      1.00      0.00  events/3
03:03:06 PM        31      0.80      0.00  events/4
03:03:06 PM        32      0.80      0.00  events/5
...

My initial thought is that, since both are on the same network, something is flooding the network. Is this consistent with the data?

Pace
  • 235
  • 2
  • 11
  • 1
    This could be related to the recent leap second see http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second – user9517 Jul 02 '12 at 21:11
  • I stopped ntp and ran the fixtime.pl script mentioned in that question and it reported "No leap, yay" – Pace Jul 02 '12 at 21:17

1 Answers1

4

See the leap-second question and workarounds at: Anyone else experiencing high rates of Linux server crashes during a leap second day?

You're experiencing one of the symptoms.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Well, the fixtime.pl script didn't indicate anything wrong but another test did and it magically fixed everything. Weird... – Pace Jul 02 '12 at 21:24