11

My server is logging frequent segmentation faults to /var/log/kern.log in different tools. So far I've seen them in Perl, PHP and rsync. All installed software is up-to-date Debian packages. Here's an exerpt from the log file:

Mar  2 01:07:54 gaz kernel: [ 5316.246303] imapsync[4533]: segfault at 8b ip 00007fb448c98fe6 sp 00007ffff571dd68 error 4 in libperl.so.5.10.1[7fb448bd7000+164000]
Mar  2 01:17:42 gaz kernel: [ 5904.354307] php5-cgi[4441]: segfault at 2bb3dc8 ip 0000000002bb3dc8 sp 00007fffbeeaae48 error 15
Mar  2 02:54:05 gaz kernel: [11687.922316] php5-cgi[4495]: segfault at 2d7acf9 ip 0000000002d7acf9 sp 00007fff60c6eb18 error 15
Mar  2 10:50:08 gaz kernel: [40250.390322] BUG: unable to handle kernel paging request at 00000000024b03f0
Mar  2 10:50:08 gaz kernel: [40250.390341] IP: [<00000000024b03f0>] 0x24b03f0
Mar  2 10:50:08 gaz kernel: [40250.390353] PGD 208c71067 PUD 21c811067 PMD 209329067 PTE 8000000211c88067
Mar  2 10:50:08 gaz kernel: [40250.390365] Oops: 0011 [#1] SMP 
Mar  2 10:50:08 gaz kernel: [40250.390373] last sysfs file: /sys/devices/pci0000:00/0000:00:12.0/host4/target4:0:0/4:0:0:0/block/sdb/stat
Mar  2 10:50:08 gaz kernel: [40250.390386] CPU 1 
Mar  2 10:50:08 gaz kernel: [40250.390392] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative xt_recent xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_
ipv4 ip6table_filter ip6_tables xt_DSCP xt_TCPMSS ipt_LOG ipt_REJECT iptable_mangle iptable_filter xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp nf_conntrack ip_tables x_tables loop snd
_hda_codec_atihdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm radeon snd_timer ttm snd drm_kms_helper soundcore drm snd_page_alloc i2c_algo_bit shpchp i2c_piix4 edac_core pcspkr k8temp evdev edac_m
ce_amd pci_hotplug i2c_core button ext3 jbd mbcache dm_mod powernow_k8 aacraid 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 md_mod
 sata_nv sata_sil sata_via sd_mod crc_t10dif ata_generic ahci pata_atiixp ohci_hcd libata r8169 mii thermal ehci_hcd processor thermal_sys scsi_mod usbcore nls_base [last unloaded: scsi_wait_scan]
Mar  2 10:50:08 gaz kernel: [40250.390566] Pid: 11482, comm: munin-limits Not tainted 2.6.32-5-amd64 #1 MS-7368
Mar  2 10:50:08 gaz kernel: [40250.390576] RIP: 0010:[<00000000024b03f0>]  [<00000000024b03f0>] 0x24b03f0
Mar  2 10:50:08 gaz kernel: [40250.390586] RSP: 0018:ffff88021cc8dec0  EFLAGS: 00010286
Mar  2 10:50:08 gaz kernel: [40250.390593] RAX: 000000001ddc1000 RBX: 0000000000000010 RCX: ffffffff810f9904
Mar  2 10:50:08 gaz kernel: [40250.390600] RDX: 0000000000000000 RSI: ffffea0007688200 RDI: 0000000000000286
Mar  2 10:50:08 gaz kernel: [40250.390608] RBP: 00000000ffffffea R08: 0000000000000025 R09: 7865542f30312e35
Mar  2 10:50:08 gaz kernel: [40250.390615] R10: 000000d01cc8ddf8 R11: 0000000000000246 R12: ffff88021cc8def8
Mar  2 10:50:08 gaz kernel: [40250.390622] R13: 0000000002295010 R14: 00000000022c9db0 R15: 0000000002488d78
Mar  2 10:50:08 gaz kernel: [40250.390630] FS:  00007f3b3c8b2700(0000) GS:ffff880008d00000(0000) knlGS:0000000000000000
Mar  2 10:50:08 gaz kernel: [40250.390641] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar  2 10:50:08 gaz kernel: [40250.390648] CR2: 00000000024b03f0 CR3: 000000021c5d1000 CR4: 00000000000006e0
Mar  2 10:50:08 gaz kernel: [40250.390656] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar  2 10:50:08 gaz kernel: [40250.390663] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar  2 10:50:08 gaz kernel: [40250.390671] Process munin-limits (pid: 11482, threadinfo ffff88021cc8c000, task ffff88021bf59530)
Mar  2 10:50:08 gaz kernel: [40250.390681] Stack:
Mar  2 10:50:08 gaz kernel: [40250.390687]  ffffffff810f1d4a ffff880208c63228 0000000000000000 00007fffc2dcecc0
Mar  2 10:50:08 gaz kernel: [40250.390697] <0> 00000000024ba2b0 0000000002295010 ffffffff810f1e3d 0000000000000004
Mar  2 10:50:08 gaz kernel: [40250.390712] <0> ffff88021bf59530 ffff88021c4edc00 ffffffff812fe0b6 ffff88021c4edc60
Mar  2 10:50:08 gaz kernel: [40250.390732] Call Trace:
Mar  2 10:50:08 gaz kernel: [40250.390742]  [<ffffffff810f1d4a>] ? vfs_fstatat+0x2c/0x57
Mar  2 10:50:08 gaz kernel: [40250.390750]  [<ffffffff810f1e3d>] ? sys_newstat+0x11/0x30
Mar  2 10:50:08 gaz kernel: [40250.390760]  [<ffffffff812fe0b6>] ? do_page_fault+0x2e0/0x2fc
Mar  2 10:50:08 gaz kernel: [40250.390768]  [<ffffffff812fbf55>] ? page_fault+0x25/0x30
Mar  2 10:50:08 gaz kernel: [40250.390777]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Mar  2 10:50:08 gaz kernel: [40250.390783] Code:  Bad RIP value.
Mar  2 10:50:08 gaz kernel: [40250.390791] RIP  [<00000000024b03f0>] 0x24b03f0
Mar  2 10:50:08 gaz kernel: [40250.390799]  RSP <ffff88021cc8dec0>
Mar  2 10:50:08 gaz kernel: [40250.390805] CR2: 00000000024b03f0
Mar  2 10:50:08 gaz kernel: [40250.391051] ---[ end trace 1cc1473b539c7f6e ]---
Mar  2 11:42:20 gaz kernel: [43382.242301] php5-cgi[10963]: segfault at d81160 ip 0000000000d81160 sp 00007fff3adcb058 error 15
Mar  2 21:51:14 gaz kernel: [79916.418302] php5-cgi[20089]: segfault at 1c59dc8 ip 0000000001c59dc8 sp 00007fff9b877fb8 error 15
Mar  3 03:45:01 gaz kernel: [101143.334305] munin-update[22519] general protection ip:7f516dce204c sp:7fff6049a978 error:0 in libperl.so.5.10.1[7f516dc7d000+164000]
Mar  3 11:22:37 gaz kernel: [128599.570307] php5-cgi[22888]: segfault at 36485a8 ip 00000000036485a8 sp 00007fff2d56e1c8 error 15
Mar  4 08:32:17 gaz kernel: [204779.842304] php5-cgi[22090]: segfault at 18 ip 0000000000689e5e sp 00007fff677a6a48 error 6 in php5-cgi[400000+6f9000]
Mar  4 10:01:02 gaz kernel: [210104.434706] rsync[22236] general protection ip:7f14a07137f9 sp:7fff88f940b8 error:0 in libc-2.11.2.so[7f14a069d000+158000]
Mar  4 11:32:22 gaz kernel: [215584.262316] BUG: unable to handle kernel paging request at 00000000ffffff9c
Mar  4 11:32:22 gaz kernel: [215584.262331] IP: [<00000000ffffff9c>] 0xffffff9c

Mar  4 11:32:22 gaz kernel: [215584.262343] PGD 0 
Mar  4 11:32:22 gaz kernel: [215584.262350] Oops: 0010 [#2] SMP 
Mar  4 11:32:22 gaz kernel: [215584.262359] last sysfs file: /sys/devices/pci0000:00/0000:00:12.0/host4/target4:0:0/4:0:0:0/block/sdb/stat
Mar  4 11:32:22 gaz kernel: [215584.262371] CPU 1 
Mar  4 11:32:22 gaz kernel: [215584.262378] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative xt_recent xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ip6table_filter ip6_tables xt_DSCP xt_TCPMSS ipt_LOG ipt_REJECT iptable_mangle iptable_filter xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp nf_conntrack ip_tables x_tables loop snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm radeon snd_timer ttm snd drm_kms_helper soundcore drm snd_page_alloc i2c_algo_bit shpchp i2c_piix4 edac_core pcspkr k8temp evdev edac_mce_amd pci_hotplug i2c_core button ext3 jbd mbcache dm_mod powernow_k8 aacraid 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 md_mod sata_nv sata_sil sata_via sd_mod crc_t10dif ata_generic ahci pata_atiixp ohci_hcd libata r8169 mii thermal ehci_hcd processor thermal_sys scsi_mod usbcore nls_base [last unloaded: scsi_wait_scan]
Mar  4 11:32:22 gaz kernel: [215584.262552] Pid: 1960, comm: proxymap Tainted: G      D    2.6.32-5-amd64 #1 MS-7368
Mar  4 11:32:22 gaz kernel: [215584.262563] RIP: 0010:[<00000000ffffff9c>]  [<00000000ffffff9c>] 0xffffff9c
Mar  4 11:32:22 gaz kernel: [215584.262573] RSP: 0018:ffff880209257e00  EFLAGS: 00010212
Mar  4 11:32:22 gaz kernel: [215584.262580] RAX: ffff8801514eb780 RBX: ffffffff810efb2d RCX: 0000000000000000
Mar  4 11:32:22 gaz kernel: [215584.262590] RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff8801514eb780
Mar  4 11:32:22 gaz kernel: [215584.262600] RBP: 00000000ffffffe9 R08: 0000000000000000 R09: 0000000000000000
Mar  4 11:32:22 gaz kernel: [215584.262611] R10: ffff880209257e78 R11: ffffffff81152c7c R12: 0000000000000001
Mar  4 11:32:22 gaz kernel: [215584.262622] R13: 0000000000008001 R14: 0000000000000024 R15: 00000000ffffff9c
Mar  4 11:32:22 gaz kernel: [215584.262633] FS:  00007fca4de35700(0000) GS:ffff880008d00000(0000) knlGS:0000000000000000
Mar  4 11:32:22 gaz kernel: [215584.262644] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar  4 11:32:22 gaz kernel: [215584.262650] CR2: 00000000ffffff9c CR3: 00000001c9cbb000 CR4: 00000000000006e0
Mar  4 11:32:22 gaz kernel: [215584.262661] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar  4 11:32:22 gaz kernel: [215584.262671] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar  4 11:32:22 gaz kernel: [215584.262682] Process proxymap (pid: 1960, threadinfo ffff880209256000, task ffff88021c4b1c40)
Mar  4 11:32:22 gaz kernel: [215584.262693] Stack:
Mar  4 11:32:22 gaz kernel: [215584.262698]  ffffffff810f8566 ffff880209257e78 ffff88021c7bf000 ffff88021c7bf0c8
Mar  4 11:32:22 gaz kernel: [215584.262709] <0> 0000800000000000 ffff88021fc0f000 ffff880209257e78 00000000fffffffe
Mar  4 11:32:22 gaz kernel: [215584.262724] <0> ffffffff810e5881 ffff880209257f48 0000000000000286 ffff88021fc0f000
Mar  4 11:32:22 gaz kernel: [215584.262743] Call Trace:
Mar  4 11:32:22 gaz kernel: [215584.262753]  [<ffffffff810f8566>] ? do_filp_open+0xa7/0x94b
Mar  4 11:32:22 gaz kernel: [215584.262763]  [<ffffffff810e5881>] ? virt_to_head_page+0x9/0x2a
Mar  4 11:32:22 gaz kernel: [215584.262771]  [<ffffffff810f9904>] ? user_path_at+0x52/0x79
Mar  4 11:32:22 gaz kernel: [215584.262779]  [<ffffffff810cfec1>] ? get_unmapped_area+0xd7/0x139
Mar  4 11:32:22 gaz kernel: [215584.262787]  [<ffffffff811019d5>] ? alloc_fd+0x67/0x10c
Mar  4 11:32:22 gaz kernel: [215584.262795]  [<ffffffff810eceaf>] ? do_sys_open+0x55/0xfc
Mar  4 11:32:22 gaz kernel: [215584.262804]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Mar  4 11:32:22 gaz kernel: [215584.262811] Code:  Bad RIP value.
Mar  4 11:32:22 gaz kernel: [215584.262819] RIP  [<00000000ffffff9c>] 0xffffff9c
Mar  4 11:32:22 gaz kernel: [215584.262828]  RSP <ffff880209257e00>
Mar  4 11:32:22 gaz kernel: [215584.262833] CR2: 00000000ffffff9c
Mar  4 11:32:22 gaz kernel: [215584.263077] ---[ end trace 1cc1473b539c7f6f ]---

As you can see there are segfaults, a general protection fault and a Kernel Oops. My first guess was that there's a Hardware problem of some sort and I asked my Hoster (it's a rented root server) to do a full hardwarecheck - they did, but couldn't find any problem.

I don't know what and how they checked but their support team is usually quite good. I ran memtester and cpuburn myself and couldn't find any error either.

Unfortunately I have no reliable way to reproduce these segfaults, they seem to be more or less random. On a hunch I disabled the firewall of the system and ran one of the programs that segfaulted regularily (imapsync) and it seemed to take longer to segfault than before, so the problem might be related to the network stack. Or could just be a random thing.

Here are the kernel specs:

# uname -a
Linux gaz 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
# cat /etc/debian_version 
6.0
# lsmod
Module                  Size  Used by
cpufreq_userspace       1992  0 
cpufreq_stats           2659  0 
cpufreq_powersave        902  0 
cpufreq_conservative     5162  0 
xt_recent               5977  0 
xt_tcpudp               2319  0 
iptable_nat             4299  0 
nf_nat                 13388  1 iptable_nat
nf_conntrack_ipv4       9833  3 iptable_nat,nf_nat
nf_defrag_ipv4          1139  1 nf_conntrack_ipv4
ip6table_filter         2384  0 
ip6_tables             15075  1 ip6table_filter
xt_DSCP                 1995  0 
xt_TCPMSS               2919  0 
ipt_LOG                 4518  0 
ipt_REJECT              1953  0 
iptable_mangle          2817  0 
iptable_filter          2258  0 
xt_multiport            2267  0 
xt_state                1303  0 
xt_limit                1782  0 
xt_conntrack            2407  0 
nf_conntrack_ftp        5537  0 
nf_conntrack           46535  6 iptable_nat,nf_nat,nf_conntrack_ipv4,xt_state,xt_conntrack,nf_conntrack_ftp
ip_tables              13899  3 iptable_nat,iptable_mangle,iptable_filter
x_tables               12845  13 xt_recent,xt_tcpudp,iptable_nat,ip6_tables,xt_DSCP,xt_TCPMSS,ipt_LOG,ipt_REJECT,xt_multiport,xt_state,xt_limit,xt_conntrack,ip_tables
loop                   11799  0 
radeon                573996  0 
ttm                    39986  1 radeon
drm_kms_helper         20065  1 radeon
snd_hda_codec_atihdmi     2251  1 
drm                   142359  3 radeon,ttm,drm_kms_helper
snd_hda_intel          20019  0 
i2c_algo_bit            4225  1 radeon
pcspkr                  1699  0 
i2c_piix4               8328  0 
snd_hda_codec          54244  2 snd_hda_codec_atihdmi,snd_hda_intel
i2c_core               15712  5 radeon,drm_kms_helper,drm,i2c_algo_bit,i2c_piix4
snd_hwdep               5380  1 snd_hda_codec
snd_pcm                60503  2 snd_hda_intel,snd_hda_codec
snd_timer              15582  1 snd_pcm
snd                    46446  5 snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer
soundcore               4598  1 snd
evdev                   7352  3 
snd_page_alloc          6249  2 snd_hda_intel,snd_pcm
k8temp                  3283  0 
edac_core              29261  0 
edac_mce_amd            6433  0 
shpchp                 26264  0 
pci_hotplug            21203  1 shpchp
button                  4650  0 
ext3                  106518  2 
jbd                    37085  1 ext3
mbcache                 5050  1 ext3

dm_mod                 53754  0 
powernow_k8            10978  1 
aacraid                59779  0 
3w_9xxx                28684  0 
3w_xxxx                20569  0 
raid10                 17809  0 
raid456                44500  0 
async_raid6_recov       5170  1 raid456
async_pq                3479  2 raid456,async_raid6_recov
raid6_pq               77179  2 async_raid6_recov,async_pq
async_xor               2478  3 raid456,async_raid6_recov,async_pq
xor                     4380  1 async_xor
async_memcpy            1198  2 raid456,async_raid6_recov
async_tx                1734  5 raid456,async_raid6_recov,async_pq,async_xor,async_memcpy
raid1                  18431  3 
raid0                   5517  0 
md_mod                 73824  7 raid10,raid456,raid1,raid0
sata_nv                19166  0 
sata_sil                7412  0 
sata_via                7928  0 
sd_mod                 29889  8 
crc_t10dif              1276  1 sd_mod
ata_generic             3047  0 
ahci                   32374  6 
r8169                  29229  0 
mii                     3210  1 r8169
thermal                11674  0 
pata_atiixp             3489  0 
libata                133632  6 sata_nv,sata_sil,sata_via,ata_generic,ahci,pata_atiixp
ohci_hcd               19212  0 
ehci_hcd               31151  0 
processor              29935  1 powernow_k8
thermal_sys            11942  2 thermal,processor
scsi_mod              122149  5 aacraid,3w_9xxx,3w_xxxx,sd_mod,libata
usbcore               122034  3 ohci_hcd,ehci_hcd
nls_base                6377  1 usbcore
# free 
             total       used       free     shared    buffers     cached
Mem:       8166128    1228036    6938092          0     140412     782060
-/+ buffers/cache:     305564    7860564
Swap:      2102456          0    2102456

So, basically my questions are:

  1. How can I diagnose this further?
  2. Is there any data in the log above that could help me to isolate the troublemaker?
  3. Are there any known problems with the above hardware/software I overlooked when googling for it?
  4. Is there a way to prevent the kernel from autoloading modules (I probably don't need all these modules and one of them might be the culprit)
Andreas Gohr
  • 381
  • 1
  • 3
  • 11

3 Answers3

5

Check your memory!

The most frequent cause of random segfaults like this is bad memory. Grab a memory checker (such as memtest86+) and test it.

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • My machine got 15 segfaults in 4 days. One of them systemd! I ran memtest86 once. All four passes revealed no memory issues. Must be something else... – Chris Smith Jul 18 '19 at 12:20
1

Things to start with... Check how much memory does the server have. Check the size of your swap partition. Check other log files for potential sources of information (syslog.) Check if there are known problems with the kernel version and your current hardware (or virtualisation system.) I'm running Debian 6 with this kernel in a small (vmware) vm, with no problems.

AndrewNimmo
  • 368
  • 1
  • 7
  • I added the memory info above (below the lsmod output). I don't use any virtualization it's a standard Debian directly on the hardware (AMD Athlon(tm) 64 X2 Dual Core Processor 6000+). The other logs don't have anything useful I could see (except Apache complaining about his fast-cgi's segfaulting) – Andreas Gohr Mar 05 '11 at 12:22
  • 1
    If you added new memory it could also be the case that the new one doesn't like the other one or the other way around. Sounds funny I know but sometimes two pieces of hw don't like each other. – Radek Nov 29 '12 at 04:06
0

One thing I would check is if your hosting provider uses so-called "burstable RAM". It is quite common for cheap hostings to have some base RAM, which can be temporarily expanded. The problem with this temporarily expanded RAM is that you can't rely on it, as it can be taken away mid-computation, resulting in a segfault.

programagor
  • 121
  • 6
  • 1
    First of all, there is enough information in the question to conclude that this is not what is happening. Besides what you describe would be so silly that without any evidence I simply won't believe anything would go through the trouble of implementing something as useless as that. – kasperd Oct 27 '18 at 14:14
  • I wish I was joking, but Burstable RAM is a real thing: http://www.webhostingtalk.com/showthread.php?t=1139796&p=8034962#post8034962 However, after re-reading the question, I can see that this is an unlikely cause. – programagor Oct 27 '18 at 19:25
  • That link says nothing about segmentation faults. – kasperd Oct 27 '18 at 19:46
  • You are correct, Burstable RAM usually results in OOM killing, not segfaults. – programagor Oct 27 '18 at 20:31