linux kernel soft lock nagios

Question

I've had this happen a few times in the past few days, anyone have any idea what might be causing this? Looks related to nagios/smp/memory management. Seems to be recurring every 24hrs or so.

This is a debian 6 system with latest 2.6.32 kernel from squeeze-proposed-updates.

Jan 22 22:40:40 zzx-zzx kernel: [176617.649082] Pid: 2070, comm: nagios3 Not tainted (2.6.32-5-686-bigmem #1) System x3550 M3 -[7944D2M]-
Jan 22 22:40:40 zzx-zzx kernel: [176617.649085] EIP: 0060:[<c10249bb>] EFLAGS: 00000202 CPU: 13
Jan 22 22:40:40 zzx-zzx kernel: [176617.649094] EIP is at native_flush_tlb_others+0x85/0xa6
Jan 22 22:40:40 zzx-zzx kernel: [176617.649096] EAX: 00000282 EBX: c14661ac ECX: c10200d8 EDX: 00000020
Jan 22 22:40:40 zzx-zzx kernel: [176617.649099] ESI: 00000005 EDI: 00000140 EBP: c14661a0 ESP: ee4c9a3c
Jan 22 22:40:40 zzx-zzx kernel: [176617.649101]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jan 22 22:40:40 zzx-zzx kernel: [176617.649104] CR0: 8005003b CR2: b758a376 CR3: 2eb7e000 CR4: 000006f0
Jan 22 22:40:40 zzx-zzx kernel: [176617.649106] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jan 22 22:40:40 zzx-zzx kernel: [176617.649108] DR6: ffff0ff0 DR7: 00000400
Jan 22 22:40:40 zzx-zzx kernel: [176617.649110] Call Trace:
Jan 22 22:40:40 zzx-zzx kernel: [176617.649116]  [<c1024aa3>] ? flush_tlb_page+0x5d/0x65
Jan 22 22:40:40 zzx-zzx kernel: [176617.649120]  [<c1023e90>] ? ptep_set_access_flags+0x59/0x63
Jan 22 22:40:40 zzx-zzx kernel: [176617.649125]  [<c10a1040>] ? do_wp_page+0x3b9/0x7dd
Jan 22 22:40:40 zzx-zzx kernel: [176617.649131]  [<c1031770>] ? finish_task_switch+0x76/0x95
Jan 22 22:40:40 zzx-zzx kernel: [176617.649135]  [<c10b61a0>] ? kmem_cache_free+0x78/0xaf
Jan 22 22:40:40 zzx-zzx kernel: [176617.649138]  [<c1031770>] ? finish_task_switch+0x76/0x95

What version of nagios are you running? And is there a specific reason you're running 32-bit Debian on a (fairly) new IBM server? — Keith, Jan 25 '12 at 18:04

Keith · Accepted Answer · 2012-02-13T20:07:50.037

0

This is a kernel bug. You can try submitting it as a bug report on Debian's bug tracker, but they'll probably just tell you to try a different kernel.

Unless you're willing to spend time building kernels from source, you are unlikely to figure out the cause of this, in my opinion. I would speculate that it's a bigmem-related bug, due to the presence of "flush_tlb_page" in the call trace.

You could try running 64-bit, instead, or try backporting a kernel from Sid. If you still have the problem with the 64-bit kernel in Squeeze, there's also a new one in Squeeze-backports.

edited Feb 13 '12 at 20:07

answered Jan 25 '12 at 18:51

Keith

4,627
14
25

Thanks Keith. Yeah, I'm going to try the non-bigmem kernel and see if that resolves it. The reason for running 32bit is related to other systems which are 32bit and taking the output of RRD files generated on this system fairly regularly. Certainly not the most optimal situation, but a lot of work to change things over to 64bit everywhere. – Blair Jan 25 '12 at 22:42

linux kernel soft lock nagios

1 Answers1