5

I have recently been "promoted" to lab admin, as I have the most experience with linux. My logcheck keeps sending me emails with hundreds of lines stemming from my syslog file. It repeats itself over, with minor changes each time. I have no idea what it is trying to tell me. Any ideas? Here a repeating snippet:

Aug 23 15:02:30 157-london kernel: [8747161.509412] ------------[ cut here ]------------
Aug 23 15:02:30 157-london kernel: [8747161.509416] WARNING: at /build/buildd-linux-2.6_2.6.32-45-amd64-FcX7RM/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/fs-writeback.c:588 writeback_inodes_wb+0x36b/0x4ff()
Aug 23 15:02:30 157-london kernel: [8747161.509419] Hardware name: Precision WorkStation 380    
Aug 23 15:02:30 157-london kernel: [8747161.509421] Modules linked in: nls_utf8 cifs xt_multiport nfsd nfs lockd fscache nfs_acl auth_rpcgss sunrpc xt_tcpudp iptable_filter ip_tables x_tables ext3 jbd mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx quota_v2 quota_tree firewire_sbp2 loop md_mod nouveau ttm i2c_i801 drm_kms_helper drm i2c_algo_bit parport_pc i2c_core parport dcdbas snd_ctxfi snd_pcm rng_core pcspkr button evdev snd_timer psmouse snd soundcore snd_page_alloc processor serio_raw xfs exportfs usbhid hid sg sr_mod cdrom sd_mod crc_t10dif ata_generic uhci_hcd ehci_hcd firewire_ohci aic79xx thermal ahci ata_piix tg3 firewire_core crc_itu_t scsi_transport_spi floppy thermal_sys usbcore nls_base libata scsi_mod libphy [last unloaded: scsi_wait_scan]
Aug 23 15:02:30 157-london kernel: [8747161.509477] Pid: 22965, comm: flush-9:0 Tainted: G        W  2.6.32-5-amd64 #1
Aug 23 15:02:30 157-london kernel: [8747161.509479] Call Trace:
Aug 23 15:02:30 157-london kernel: [8747161.509482]  [<ffffffff81109027>] ? writeback_inodes_wb+0x36b/0x4ff
Aug 23 15:02:30 157-london kernel: [8747161.509485]  [<ffffffff81109027>] ? writeback_inodes_wb+0x36b/0x4ff
Aug 23 15:02:30 157-london kernel: [8747161.509489]  [<ffffffff8104df40>] ? warn_slowpath_common+0x77/0xa3
Aug 23 15:02:30 157-london kernel: [8747161.509492]  [<ffffffff81109027>] ? writeback_inodes_wb+0x36b/0x4ff
Aug 23 15:02:30 157-london kernel: [8747161.509496]  [<ffffffff811092e7>] ? wb_writeback+0x12c/0x1ab
Aug 23 15:02:30 157-london kernel: [8747161.509499]  [<ffffffff81109481>] ? wb_do_writeback+0x73/0x165
Aug 23 15:02:30 157-london kernel: [8747161.509516]  [<ffffffff810c934c>] ? bdi_start_fn+0xbc/0xd0
Aug 23 15:02:30 157-london kernel: [8747161.509521]  [<ffffffff810c9290>] ? bdi_start_fn+0x0/0xd0
Aug 23 15:02:30 157-london kernel: [8747161.509524]  [<ffffffff81064d75>] ? kthread+0x79/0x81
Aug 23 15:02:30 157-london kernel: [8747161.509528]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
Aug 23 15:02:30 157-london kernel: [8747161.509531]  [<ffffffff81064cfc>] ? kthread+0x0/0x81
Aug 23 15:02:30 157-london kernel: [8747161.509534]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
Aug 23 15:02:30 157-london kernel: [8747161.509536] ---[ end trace 7157c19847c3ced3 ]---
Markus
  • 200
  • 1
  • 2
  • 13
  • Have you checked the logical structure of the drive with `fsck`? –  Aug 23 '12 at 20:27
  • Checking the drives with `fsck` doesn't seem to work. From what I know there is a RAID5 filesystem (`sda`,`sdb`,`sdc` make up `md0`), whichs disks give me the error `fsck: fsck.linux_raid_member: not found`. Linux is installed on 'sdd' which is mounted and hence can't be checked. Trying to check `md0` it tells me to used `xfs_check` and `xfs_repair`. – Markus Aug 23 '12 at 20:47

1 Answers1

10

What you're seeing is a stack trace. It's an un-handled error from the kernel showing the path of execution when something goes so damn wrong that there's nothing else in place to do except log the problem and leave it for a human to figure out.

The call at the head is usually the defining one:

WARNING: at /build/buildd-linux-2.6_2.6.32-45-amd64-FcX7RM/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/fs-writeback.c:588 writeback_inodes_wb+0x36b/0x4ff()

fs-writeback: that's writing to a filesystem. As pointed out out in a comment to your question, you may have errors on your disk. I'd start by taking the machine to single-user mode with a read-only mount and running fsck.

Jeff Ferland
  • 20,239
  • 2
  • 61
  • 85
  • Since the computer in question is our file server and is currently in use, I may have to postpone maintenance until the weekend, when hopefully no one is using it. – Markus Aug 23 '12 at 20:54
  • 3
    @Markus filesystem degradation never gets better on its own, only worse. Now's a good time to make sure you have good backups and consider your appetite for the system possibly becoming unavailable. – Jeff Ferland Aug 23 '12 at 20:57
  • you are right. I will talk to my lab mates to see if I can go into maintenance tonight. Luckily I checked our backup file system very recently... I assume first I should check the linux drive? Then the RAID filesystem? – Markus Aug 23 '12 at 21:23
  • I saw the same messages on one of our file servers. What eventually happened is that certain files no longer became available. In the console, they would flash red and I would get inode and block errors trying to read them. Good thing you have backups. :) – Alo Aug 23 '12 at 22:31
  • Alright, so I was able to run `fsck` / `xfs_check` from live cd and found some problems. `xfs_repair` repaired the problems and moved some stuff to `lost+found`. Is there any way of knowing where this data came from? Is there a way to check if there are now files "missing"? The system booted without a hitch, but I am still curious about that extra data... largest file here is only about 1.5M. – Markus Aug 24 '12 at 22:58