3

I was running rsync on my Dell XPS Core 2 Duo tower when it froze up. The machine is running Ubuntu 8.04 LTS, 3GB RAM, and software RAID 5 (mdadm) across 3 disks. The system is on the 4th disk. On restart I found this lovely gem in /var/log/kern.log:

Oct 31 02:38:33 myhostname kernel: [617414.584615] Unable to handle kernel NULL pointer dereference at 0000000000000070 RIP:

Then this morning it happened again, but there was more info in the log (see below). I'm wondering if anyone can give any insight into what this means. Unfortunately the machine is in a data center 3000 miles away from me right now so swapping out memory will be tricky.

Thanks in advance for any suggestions!

Nov  1 01:24:55 myhostname kernel: [34780.996038] Unable to handle kernel NULL pointer dereference at 0000000000000070 RIP:
Nov  1 01:24:55 myhostname kernel: [34780.996050]  [<ffffffff80470a60>] _spin_lock+0x0/0x10
Nov  1 01:24:55 myhostname kernel: [34780.996099] PGD bb0b5067 PUD bbc91067 PMD 0
Nov  1 01:24:55 myhostname kernel: [34780.996121] Oops: 0002 [1] SMP
Nov  1 01:24:55 myhostname kernel: [34780.996140] CPU 1
Nov  1 01:24:55 myhostname kernel: [34780.996156] Modules linked in: nfs lockd nfs_acl sunrpc autofs4 iptable_filter ip_tables x_tables ipv6 parport_pc lp parport loop af_packet serio_raw psmouse button dcdbas intel_agp snd_hda_intel shpchp pci_hotplug iTCO_wdt iTCO_vendor_support evdev snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore pcspkr ext3 jbd mbcache sg sr_mod cdrom sd_mod 8139too ata_generic pata_acpi usbhid hid ata_piix 8139cp mii libata scsi_mod ehci_hcd uhci_hcd e1000 usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
Nov  1 01:24:55 myhostname kernel: [34780.996422] Pid: 171, comm: kswapd0 Not tainted 2.6.24-16-server #1
Nov  1 01:24:55 myhostname kernel: [34780.996442] RIP: 0010:[<ffffffff80470a60>]  [<ffffffff80470a60>] _spin_lock+0x0/0x10
Nov  1 01:24:55 myhostname kernel: [34780.996474] RSP: 0018:ffff8100b904fd48  EFLAGS: 00010202
Nov  1 01:24:55 myhostname kernel: [34780.996492] RAX: 0000000000000001 RBX: ffff8100167d23c8 RCX: 0000000000000000
Nov  1 01:24:55 myhostname kernel: [34780.996514] RDX: 0000000000000001 RSI: 00000000000000d0 RDI: 0000000000000070
Nov  1 01:24:55 myhostname kernel: [34780.996535] RBP: ffff8100167d2550 R08: 0000000000000000 R09: 0000000000000000
Nov  1 01:24:55 myhostname kernel: [34780.996555] R10: 0000000000000000 R11: ffffffff88232010 R12: 0000000000000028
Nov  1 01:24:55 myhostname kernel: [34780.996576] R13: ffff8100167d24d8 R14: 0000000000000000 R15: 0000000000000000
Nov  1 01:24:55 myhostname kernel: [34780.996597] FS:  0000000000000000(0000) GS:ffff8100bd001700(0000) knlGS:0000000000000000
Nov  1 01:24:55 myhostname kernel: [34780.996628] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Nov  1 01:24:55 myhostname kernel: [34780.996647] CR2: 0000000000000070 CR3: 00000000bbd44000 CR4: 00000000000006e0
Nov  1 01:24:55 myhostname kernel: [34780.996668] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov  1 01:24:55 myhostname kernel: [34780.996688] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov  1 01:24:55 myhostname kernel: [34780.996710] Process kswapd0 (pid: 171, threadinfo ffff8100b904e000, task ffff8100b90487e0)
Nov  1 01:24:55 myhostname kernel: [34780.996741] Stack:  ffffffff802dc5b2 ffff8100167d23c8 0000000000000080 0000000000000028
Nov  1 01:24:55 myhostname kernel: [34780.996779]  ffff8100b904fd80 0000000000000028 ffffffff802cb244 ffff8100167d20d8
Nov  1 01:24:55 myhostname kernel: [34780.996815]  ffff810092da43d8 00000000001c4cec 0000000000067714 000000000000009b
Nov  1 01:24:55 myhostname kernel: [34780.996839] Call Trace:
Nov  1 01:24:55 myhostname kernel: [34780.996868]  [remove_inode_buffers+0x42/0x100] remove_inode_buffers+0x42/0x100
Nov  1 01:24:55 myhostname kernel: [34780.996891]  [shrink_icache_memory+0x1f4/0x2a0] shrink_icache_memory+0x1f4/0x2a0
Nov  1 01:24:55 myhostname kernel: [34780.996916]  [shrink_slab+0x124/0x180] shrink_slab+0x124/0x180
Nov  1 01:24:55 myhostname kernel: [34780.996939]  [kswapd+0x391/0x560] kswapd+0x391/0x560
Nov  1 01:24:55 myhostname kernel: [34780.996965]  [<ffffffff80254200>] autoremove_wake_function+0x0/0x30
Nov  1 01:24:55 myhostname kernel: [34780.996989]  [kswapd+0x0/0x560] kswapd+0x0/0x560
Nov  1 01:24:55 myhostname kernel: [34780.997009]  [kthread+0x4b/0x80] kthread+0x4b/0x80
Nov  1 01:24:55 myhostname kernel: [34780.997029]  [child_rip+0xa/0x12] child_rip+0xa/0x12
Nov  1 01:24:55 myhostname kernel: [34780.997053]  [kthread+0x0/0x80] kthread+0x0/0x80
Nov  1 01:24:55 myhostname kernel: [34780.997072]  [child_rip+0x0/0x12] child_rip+0x0/0x12
Nov  1 01:24:55 myhostname kernel: [34780.997091]
Nov  1 01:24:55 myhostname kernel: [34780.997104]
Nov  1 01:24:55 myhostname kernel: [34780.997105] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 90 f0 81 2f 00
Nov  1 01:24:55 myhostname kernel: [34780.997184] RIP  [<ffffffff80470a60>] _spin_lock+0x0/0x10
Nov  1 01:24:55 myhostname kernel: [34780.997205]  RSP <ffff8100b904fd48>
Nov  1 01:24:55 myhostname kernel: [34780.997221] CR2: 0000000000000070
Nov  1 01:24:55 myhostname kernel: [34780.997458] ---[ end trace 26a2b00c44abedb6 ]---
Jason Plank
  • 105
  • 5

1 Answers1

2

Ok, so this is a fairly standard kernel oops. It's probably caused by "Process kswapd0" having done something undesirable to the disk.

Things to check: 1) run smartctl on all disks, check if they're operating within recommended tolerances.

2) have a poke around in dmesg and /var/log/messages and see if anything untoward happened at the same time.

3) Search Launchpad and ubuntu forums for clues to what might have caused this, or ask on #ubuntu on freenode IRC for some pointers. You'll probably be asked for more information like lspci, lsmod and so on.

Chances are, someone else has had a similar problem.

4) run memtest86 overnight, see if it comes up with any blinding memory errors.

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
  • 2) I looked around and didn't see anything out of the ordinary there 4) I ran memtest86 for ~ 13 hours without errors. I'll keep working on 1) and 3). Thanks for the suggestions! –  Nov 02 '09 at 18:09
  • memtest86 ran for about 24 hours with no errors. I also ran smartctl on all disks and they all report: SMART overall-health self-assessment test result: PASSED I've done a lot of searching online and have not come across anything... Oh, and FWIW, I'm running linux kernel 2.6.24-16-server. –  Nov 03 '09 at 16:53
  • OK, perhaps consider filing a bug report on Launchpad.net and see if any of the hardcore Ubuntu geeks know what's going on. Was this a one off occurance, or have you seen it lots and lots? – Tom O'Connor Nov 04 '09 at 09:02
  • I can reproduce it when I run a python script that calls rsync from the crontab as root. That's actually the only time it happens. Thanks for the help, I'll file an Ubuntu bug report. –  Nov 04 '09 at 22:51
  • Can you strace the python script, or maybe strace the rsync to get a better idea of what it's breaking. Might not be terribly obvious. One other suggestion, use Pastebin or pastie, because serverfault tends to break formatting of long lines and stuff. – Tom O'Connor Nov 05 '09 at 11:45