7

I have a multi-user CentOS 6.1 database system running an application named ABC. The server is 64-bit, 8GB RAM and 6 vCPU (on VMWare ESXi 4). We get frequent dumps into the dmesg and the system logs detailing kernel page allocation failures.

What do these messages mean in this context? How can we remedy this?

Feb  5 08:10:52 Fruity kernel: ABC: page allocation failure. order:1, mode:0x20
Feb  5 08:10:52 Fruity kernel: Pid: 23588, comm: ABC Not tainted 2.6.32-131.17.1.el6.x86_64 #1
Feb  5 08:10:52 Fruity kernel: Call Trace:
Feb  5 08:10:52 Fruity kernel: <IRQ>  [<ffffffff8112016e>] ? __alloc_pages_nodemask+0x71e/0x8b0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81159a52>] ? kmem_getpages+0x62/0x170
Feb  5 08:10:52 Fruity kernel: [<ffffffff8115a66a>] ? fallback_alloc+0x1ba/0x270
Feb  5 08:10:52 Fruity kernel: [<ffffffff8115a0bf>] ? cache_grow+0x2cf/0x320
Feb  5 08:10:52 Fruity kernel: [<ffffffff8115a3e9>] ? ____cache_alloc_node+0x99/0x160
Feb  5 08:10:52 Fruity kernel: [<ffffffff8115b1ab>] ? kmem_cache_alloc+0x11b/0x190
Feb  5 08:10:52 Fruity kernel: [<ffffffff81411ba8>] ? sk_prot_alloc+0x48/0x1a0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81411e12>] ? sk_clone+0x22/0x2c0
Feb  5 08:10:52 Fruity kernel: [<ffffffff8145caf6>] ? inet_csk_clone+0x16/0xd0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81475be3>] ? tcp_create_openreq_child+0x23/0x450
Feb  5 08:10:52 Fruity kernel: [<ffffffff814735cd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
Feb  5 08:10:52 Fruity kernel: [<ffffffff814759a1>] ? tcp_check_req+0x201/0x420
Feb  5 08:10:52 Fruity kernel: [<ffffffff8146b4b6>] ? tcp_rcv_state_process+0x116/0xa30
Feb  5 08:10:52 Fruity kernel: [<ffffffff8105055a>] ? enqueue_entity+0x13a/0x340
Feb  5 08:10:52 Fruity kernel: [<ffffffff81472feb>] ? tcp_v4_do_rcv+0x35b/0x430
Feb  5 08:10:52 Fruity kernel: [<ffffffff81474760>] ? tcp_v4_rcv+0x4e0/0x860
Feb  5 08:10:52 Fruity kernel: [<ffffffff8105dc32>] ? default_wake_function+0x12/0x20
Feb  5 08:10:52 Fruity kernel: [<ffffffff8145247d>] ? ip_local_deliver_finish+0xdd/0x2d0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81452708>] ? ip_local_deliver+0x98/0xa0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81451bcd>] ? ip_rcv_finish+0x12d/0x440
Feb  5 08:10:52 Fruity kernel: [<ffffffff8104fc08>] ? update_curr+0xf8/0x1e0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81452155>] ? ip_rcv+0x275/0x350
Feb  5 08:10:52 Fruity kernel: [<ffffffff8141dccb>] ? __netif_receive_skb+0x39b/0x6b0
Feb  5 08:10:52 Fruity kernel: [<ffffffff810db997>] ? cpu_quiet_msk+0x77/0x130
Feb  5 08:10:52 Fruity kernel: [<ffffffff8141e07a>] ? process_backlog+0x9a/0x100
Feb  5 08:10:52 Fruity kernel: [<ffffffff81422533>] ? net_rx_action+0x103/0x2f0
Feb  5 08:10:52 Fruity kernel: [<ffffffff8106f6e1>] ? __do_softirq+0xc1/0x1d0
Feb  5 08:10:52 Fruity kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
Feb  5 08:10:52 Fruity kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
Feb  5 08:10:52 Fruity kernel: <EOI>  [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81070028>] ? local_bh_enable_ip+0x98/0xa0
Feb  5 08:10:52 Fruity kernel: [<ffffffff814dd92b>] ? _spin_unlock_bh+0x1b/0x20
Feb  5 08:10:52 Fruity kernel: [<ffffffff8140f46e>] ? release_sock+0xce/0xe0
Feb  5 08:10:52 Fruity kernel: [<ffffffff81483953>] ? inet_stream_connect+0x183/0x2c0
Feb  5 08:10:52 Fruity kernel: [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
Feb  5 08:10:52 Fruity kernel: [<ffffffff8140d007>] ? sys_connect+0xd7/0xf0
Feb  5 08:10:52 Fruity kernel: [<ffffffff8145f652>] ? compat_tcp_setsockopt+0x22/0x30
Feb  5 08:10:52 Fruity kernel: [<ffffffff8140eb9c>] ? compat_sock_common_setsockopt+0x1c/0x30
Feb  5 08:10:52 Fruity kernel: [<ffffffff81437d05>] ? compat_sys_setsockopt+0x85/0x220
Feb  5 08:10:52 Fruity kernel: [<ffffffff81184828>] ? sys_fcntl+0x118/0x530
Feb  5 08:10:52 Fruity kernel: [<ffffffff8143805e>] ? compat_sys_socketcall+0x1be/0x200
Feb  5 08:10:52 Fruity kernel: [<ffffffff810478b0>] ? sysenter_dispatch+0x7/0x2e
Feb  5 08:10:52 Fruity kernel: Mem-Info:
Feb  5 08:10:52 Fruity kernel: Node 0 DMA per-cpu:
Feb  5 08:10:52 Fruity kernel: CPU    0: hi:    0, btch:   1 usd:   0
Feb  5 08:10:52 Fruity kernel: CPU    1: hi:    0, btch:   1 usd:   0
Feb  5 08:10:52 Fruity kernel: CPU    2: hi:    0, btch:   1 usd:   0
Feb  5 08:10:52 Fruity kernel: CPU    3: hi:    0, btch:   1 usd:   0
Feb  5 08:10:52 Fruity kernel: CPU    4: hi:    0, btch:   1 usd:   0
Feb  5 08:10:52 Fruity kernel: CPU    5: hi:    0, btch:   1 usd:   0
Feb  5 08:10:52 Fruity kernel: Node 0 DMA32 per-cpu:
Feb  5 08:10:52 Fruity kernel: CPU    0: hi:  186, btch:  31 usd: 167
Feb  5 08:10:52 Fruity kernel: CPU    1: hi:  186, btch:  31 usd:  44
Feb  5 08:10:52 Fruity kernel: CPU    2: hi:  186, btch:  31 usd:  59
Feb  5 08:10:52 Fruity kernel: CPU    3: hi:  186, btch:  31 usd:  46
Feb  5 08:10:52 Fruity kernel: CPU    4: hi:  186, btch:  31 usd: 157
Feb  5 08:10:52 Fruity kernel: CPU    5: hi:  186, btch:  31 usd:  45
Feb  5 08:10:52 Fruity kernel: Node 0 Normal per-cpu:
Feb  5 08:10:52 Fruity kernel: CPU    0: hi:  186, btch:  31 usd: 182
Feb  5 08:10:52 Fruity kernel: CPU    1: hi:  186, btch:  31 usd:  44
Feb  5 08:10:52 Fruity kernel: CPU    2: hi:  186, btch:  31 usd:  15
Feb  5 08:10:52 Fruity kernel: CPU    3: hi:  186, btch:  31 usd:  88
Feb  5 08:10:52 Fruity kernel: CPU    4: hi:  186, btch:  31 usd: 181
Feb  5 08:10:52 Fruity kernel: CPU    5: hi:  186, btch:  31 usd:  33
Feb  5 08:10:52 Fruity kernel: active_anon:79381 inactive_anon:21406 isolated_anon:0
Feb  5 08:10:52 Fruity kernel: active_file:395766 inactive_file:1432708 isolated_file:0
Feb  5 08:10:52 Fruity kernel: unevictable:0 dirty:297 writeback:0 unstable:0
Feb  5 08:10:52 Fruity kernel: free:31126 slab_reclaimable:25909 slab_unreclaimable:44714
Feb  5 08:10:52 Fruity kernel: mapped:3908 shmem:103 pagetables:4196 bounce:0
Feb  5 08:10:52 Fruity kernel: Node 0 DMA free:15680kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb  5 08:10:52 Fruity kernel: lowmem_reserve[]: 0 3000 8050 8050
Feb  5 08:10:52 Fruity kernel: Node 0 DMA32 free:56332kB min:25140kB low:31424kB high:37708kB active_anon:36800kB inactive_anon:33152kB active_file:631228kB inactive_file:2126792kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:360kB writeback:0kB mapped:4500kB shmem:4kB slab_reclaimable:42108kB slab_unreclaimable:4760kB kernel_stack:256kB pagetables:1228kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb  5 08:10:52 Fruity kernel: lowmem_reserve[]: 0 0 5050 5050
Feb  5 08:10:52 Fruity kernel: Node 0 Normal free:52492kB min:42316kB low:52892kB high:63472kB active_anon:280724kB inactive_anon:52472kB active_file:951836kB inactive_file:3603784kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:0kB dirty:828kB writeback:0kB mapped:11132kB shmem:408kB slab_reclaimable:61528kB slab_unreclaimable:174096kB kernel_stack:3112kB pagetables:15556kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb  5 08:10:52 Fruity kernel: lowmem_reserve[]: 0 0 0 0
Feb  5 08:10:52 Fruity kernel: Node 0 DMA: 4*4kB 2*8kB 2*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15680kB
Feb  5 08:10:52 Fruity kernel: Node 0 DMA32: 12892*4kB 79*8kB 30*16kB 10*32kB 4*64kB 19*128kB 3*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56456kB
Feb  5 08:10:52 Fruity kernel: Node 0 Normal: 12558*4kB 35*8kB 1*16kB 1*32kB 4*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 52864kB
Feb  5 08:10:52 Fruity kernel: 1834295 total pagecache pages
Feb  5 08:10:52 Fruity kernel: 5823 pages in swap cache
Feb  5 08:10:52 Fruity kernel: Swap cache stats: add 112073, delete 106250, find 283106960/283124994
Feb  5 08:10:52 Fruity kernel: Free swap  = 8352448kB
Feb  5 08:10:52 Fruity kernel: Total swap = 8388600kB
Feb  5 08:10:52 Fruity kernel: 2097136 pages RAM
Feb  5 08:10:52 Fruity kernel: 48740 pages reserved
Feb  5 08:10:52 Fruity kernel: 73879 pages shared
Feb  5 08:10:52 Fruity kernel: 1940523 pages non-shared

Edit: This is still happening, even with some of the changes suggested below. The current trace looks like:

Feb 29 04:45:33 Fruity kernel: swapper: page allocation failure. order:1, mode:0x20
Feb 29 04:45:33 Fruity kernel: Pid: 0, comm: swapper Not tainted 2.6.32-131.17.1.el6.x86_64 #1
Feb 29 04:45:33 Fruity kernel: Call Trace:
Feb 29 04:45:33 Fruity kernel: <IRQ>  [<ffffffff8112016e>] ? __alloc_pages_nodemask+0x71e/0x8b0
Feb 29 04:45:33 Fruity kernel: [<ffffffff81159a52>] ? kmem_getpages+0x62/0x170
Feb 29 04:45:33 Fruity kernel: [<ffffffff8115a66a>] ? fallback_alloc+0x1ba/0x270
Feb 29 04:45:33 Fruity kernel: [<ffffffff8115a0bf>] ? cache_grow+0x2cf/0x320
Feb 29 04:45:33 Fruity kernel: [<ffffffff8115a3e9>] ? ____cache_alloc_node+0x99/0x160
Feb 29 04:45:33 Fruity kernel: [<ffffffff8115b1ab>] ? kmem_cache_alloc+0x11b/0x190
Feb 29 04:45:33 Fruity kernel: [<ffffffff81411ba8>] ? sk_prot_alloc+0x48/0x1a0
Feb 29 04:45:33 Fruity kernel: [<ffffffff81411e12>] ? sk_clone+0x22/0x2c0
Feb 29 04:45:33 Fruity kernel: [<ffffffff8145caf6>] ? inet_csk_clone+0x16/0xd0
Feb 29 04:45:33 Fruity kernel: [<ffffffff81475be3>] ? tcp_create_openreq_child+0x23/0x450
Feb 29 04:45:33 Fruity kernel: [<ffffffff814735cd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
Feb 29 04:45:33 Fruity kernel: [<ffffffff814759a1>] ? tcp_check_req+0x201/0x420
Feb 29 04:45:33 Fruity kernel: [<ffffffff8146b4b6>] ? tcp_rcv_state_process+0x116/0xa30
Feb 29 04:45:33 Fruity kernel: [<ffffffff81472feb>] ? tcp_v4_do_rcv+0x35b/0x430
Feb 29 04:45:33 Fruity kernel: [<ffffffff81413a1b>] ? consume_skb+0x3b/0x80
Feb 29 04:45:33 Fruity kernel: [<ffffffff81474760>] ? tcp_v4_rcv+0x4e0/0x860
Feb 29 04:45:33 Fruity kernel: [<ffffffff8145247d>] ? ip_local_deliver_finish+0xdd/0x2d0
Feb 29 04:45:33 Fruity kernel: [<ffffffff81452708>] ? ip_local_deliver+0x98/0xa0
Feb 29 04:45:33 Fruity kernel: [<ffffffff81451bcd>] ? ip_rcv_finish+0x12d/0x440
Feb 29 04:45:33 Fruity kernel: [<ffffffff81452155>] ? ip_rcv+0x275/0x350
Feb 29 04:45:33 Fruity kernel: [<ffffffff8141dccb>] ? __netif_receive_skb+0x39b/0x6b0
Feb 29 04:45:33 Fruity kernel: [<ffffffff810a41a4>] ? __smp_call_function_single+0x64/0xe0
Feb 29 04:45:33 Fruity kernel: [<ffffffff8141ffd8>] ? netif_receive_skb+0x58/0x60
Feb 29 04:45:33 Fruity kernel: [<ffffffffa0131853>] ? vmxnet3_poll+0x403/0x9f0 [vmxnet3]
Feb 29 04:45:33 Fruity kernel: [<ffffffffa0036c40>] ? pvscsi_process_completion_ring+0xe0/0x350 [vmw_pvscsi]
Feb 29 04:45:33 Fruity kernel: [<ffffffff81422533>] ? net_rx_action+0x103/0x2f0
Feb 29 04:45:33 Fruity kernel: [<ffffffff8106f6e1>] ? __do_softirq+0xc1/0x1d0
Feb 29 04:45:33 Fruity kernel: [<ffffffff810d6930>] ? handle_IRQ_event+0x60/0x170
Feb 29 04:45:33 Fruity kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
Feb 29 04:45:33 Fruity kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
Feb 29 04:45:33 Fruity kernel: [<ffffffff8106f4c5>] ? irq_exit+0x85/0x90
Feb 29 04:45:33 Fruity kernel: [<ffffffff814e3195>] ? do_IRQ+0x75/0xf0
Feb 29 04:45:33 Fruity kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11
Feb 29 04:45:33 Fruity kernel: <EOI>  [<ffffffff8103628b>] ? native_safe_halt+0xb/0x10
Feb 29 04:45:33 Fruity kernel: [<ffffffff810142ed>] ? default_idle+0x4d/0xb0
Feb 29 04:45:33 Fruity kernel: [<ffffffff81009e86>] ? cpu_idle+0xb6/0x110
Feb 29 04:45:33 Fruity kernel: [<ffffffff814c33da>] ? rest_init+0x7a/0x80
Feb 29 04:45:33 Fruity kernel: [<ffffffff81c1df28>] ? start_kernel+0x41d/0x429
Feb 29 04:45:33 Fruity kernel: [<ffffffff81c1d33a>] ? x86_64_start_reservations+0x125/0x129
Feb 29 04:45:33 Fruity kernel: [<ffffffff81c1d438>] ? x86_64_start_kernel+0xfa/0x109
Feb 29 04:45:33 Fruity kernel: Mem-Info:
Feb 29 04:45:33 Fruity kernel: Node 0 DMA per-cpu:
Feb 29 04:45:33 Fruity kernel: CPU    0: hi:    0, btch:   1 usd:   0
Feb 29 04:45:33 Fruity kernel: CPU    1: hi:    0, btch:   1 usd:   0
Feb 29 04:45:33 Fruity kernel: CPU    2: hi:    0, btch:   1 usd:   0
Feb 29 04:45:33 Fruity kernel: CPU    3: hi:    0, btch:   1 usd:   0
Feb 29 04:45:33 Fruity kernel: CPU    4: hi:    0, btch:   1 usd:   0
Feb 29 04:45:33 Fruity kernel: CPU    5: hi:    0, btch:   1 usd:   0
Feb 29 04:45:33 Fruity kernel: Node 0 DMA32 per-cpu:
Feb 29 04:45:33 Fruity kernel: CPU    0: hi:  186, btch:  31 usd:  46
Feb 29 04:45:33 Fruity kernel: CPU    1: hi:  186, btch:  31 usd:   1
Feb 29 04:45:33 Fruity kernel: CPU    2: hi:  186, btch:  31 usd:  23
Feb 29 04:45:33 Fruity kernel: CPU    3: hi:  186, btch:  31 usd:  10
Feb 29 04:45:33 Fruity kernel: CPU    4: hi:  186, btch:  31 usd:  38
Feb 29 04:45:33 Fruity kernel: CPU    5: hi:  186, btch:  31 usd:   2
Feb 29 04:45:33 Fruity kernel: Node 0 Normal per-cpu:
Feb 29 04:45:33 Fruity kernel: CPU    0: hi:  186, btch:  31 usd:  65
Feb 29 04:45:33 Fruity kernel: CPU    1: hi:  186, btch:  31 usd:   0
Feb 29 04:45:33 Fruity kernel: CPU    2: hi:  186, btch:  31 usd:  14
Feb 29 04:45:33 Fruity kernel: CPU    3: hi:  186, btch:  31 usd:   2
Feb 29 04:45:33 Fruity kernel: CPU    4: hi:  186, btch:  31 usd:  29
Feb 29 04:45:33 Fruity kernel: CPU    5: hi:  186, btch:  31 usd:  50
Feb 29 04:45:33 Fruity kernel: active_anon:118532 inactive_anon:29343 isolated_anon:0
Feb 29 04:45:33 Fruity kernel: active_file:870242 inactive_file:899801 isolated_file:0
Feb 29 04:45:33 Fruity kernel: unevictable:0 dirty:5135 writeback:0 unstable:0
Feb 29 04:45:33 Fruity kernel: free:33179 slab_reclaimable:34315 slab_unreclaimable:45350
Feb 29 04:45:33 Fruity kernel: mapped:3464 shmem:133 pagetables:4997 bounce:0
Feb 29 04:45:33 Fruity kernel: Node 0 DMA free:15680kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 29 04:45:33 Fruity kernel: lowmem_reserve[]: 0 3000 8050 8050
Feb 29 04:45:33 Fruity kernel: Node 0 DMA32 free:64200kB min:25140kB low:31424kB high:37708kB active_anon:59816kB inactive_anon:47980kB active_file:1319196kB inactive_file:1374832kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:7680kB writeback:0kB mapped:3004kB shmem:40kB slab_reclaimable:62060kB slab_unreclaimable:5368kB kernel_stack:160kB pagetables:900kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 29 04:45:33 Fruity kernel: lowmem_reserve[]: 0 0 5050 5050
Feb 29 04:45:33 Fruity kernel: Node 0 Normal free:52836kB min:42316kB low:52892kB high:63472kB active_anon:414312kB inactive_anon:69392kB active_file:2161772kB inactive_file:2224372kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:0kB dirty:12860kB writeback:0kB mapped:10852kB shmem:492kB slab_reclaimable:75200kB slab_unreclaimable:176032kB kernel_stack:3384kB pagetables:19088kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 29 04:45:33 Fruity kernel: lowmem_reserve[]: 0 0 0 0
Feb 29 04:45:33 Fruity kernel: Node 0 DMA: 4*4kB 2*8kB 2*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15680kB
Feb 29 04:45:33 Fruity kernel: Node 0 DMA32: 15988*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63952kB
Feb 29 04:45:33 Fruity kernel: Node 0 Normal: 13209*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 52836kB
Feb 29 04:45:33 Fruity kernel: 1776603 total pagecache pages
Feb 29 04:45:33 Fruity kernel: 6398 pages in swap cache
Feb 29 04:45:33 Fruity kernel: Swap cache stats: add 163231, delete 156833, find 403959091/403986630
Feb 29 04:45:33 Fruity kernel: Free swap  = 8339552kB
Feb 29 04:45:33 Fruity kernel: Total swap = 8388600kB
Feb 29 04:45:33 Fruity kernel: 2097136 pages RAM
Feb 29 04:45:33 Fruity kernel: 48740 pages reserved
Feb 29 04:45:33 Fruity kernel: 198220 pages shared
Feb 29 04:45:33 Fruity kernel: 1833933 pages non-shared
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Is this a 32bit system ? This page `http://www.cyberciti.biz/faq/linux-page-allocation-failure-erro/` says it's a memory fragmentation issue. It seems to me that it shouldn't happen on x86_64. – AndreasM Feb 10 '12 at 16:02
  • This is a 64-bit virtualized server with 8GB RAM. – ewwhite Feb 10 '12 at 16:04
  • Does the host report any problems? – AndreasM Feb 10 '12 at 16:05
  • No ESXi host errors reported. This is entirely contained within the VM. – ewwhite Feb 10 '12 at 16:34
  • Do you know what the app ABC does? Seems to call setsocketopt, maybe with a crazy buffer value. – AndreasM Feb 10 '12 at 16:36
  • App ABC is a terminal-based application. It's the entry point into the system. There's one instance per user. – ewwhite Feb 10 '12 at 16:58
  • Sorry, no idea. Things you could try: Update to centos 6.2 and update the vmware-tools in the guest. Maybe the memory driver has a problem. – AndreasM Feb 10 '12 at 17:02
  • The VM had 8GB RAM installed, 7GB used, 6GB cached. No swapping. – ewwhite Feb 14 '12 at 12:45
  • It seems you're getting hit by this: http://kerneltrap.org/mailarchive/linux-kernel/2007/11/5/386882 (0x20 is GFP_ATOMIC). See the other posts on this thread. What's the MTU on the ethernet interface? – AndreasM Feb 14 '12 at 12:56

3 Answers3

3

Thinking out loud here but have you considered increasing the vm.min_free_kbytes value using sysctl?

something like:

sysctl vm.min_free_kbytes=16384 

(ps - not 100% sure what its suppose to be on centos, more likely to be found under /proc/sys/vm/min_free_kbytes)

Cold T
  • 2,391
  • 2
  • 16
  • 28
  • [root@Fruity ~]# cat /proc/sys/vm/min_free_kbytes == 67584 – ewwhite Feb 14 '12 at 13:11
  • 66MB does sound more then reasonable but if heavy fragmentation occur this might be still too low. As this is in a VM environment, have you updated the host and respective vmtools. You might want to monitor memory usage not just on the application but other processes. Last but not least, I'd resort to upgrading the kernal – Cold T Feb 14 '12 at 14:08
  • This is the default setting in CentOS 6. – ewwhite Feb 14 '12 at 14:52
  • up the value to at least 256MB, i have no idea what the program ABC does but if it uses/needs large amount of memory rapidly, it would certainly cause the page allocation failures. The default value is respective to the machine's total memory. You might be also wondering what this min-free-kbytes got to do with page allocations, well in simplest terms it uses this value to calculate the minimum number memory pages – Cold T Feb 14 '12 at 15:32
  • Program ABC needs ~20MB per user instance. Let's say the system maxes out at 80 users. The VMWare tools are current, and the kernel is up-to-date for the revision of CentOS. Although, I should update to CentOS 6.2. ABC is not a RAM hog at all. Most of the used memory is filesystem cache. – ewwhite Feb 14 '12 at 16:10
3

I've been seeing a lot of those as well ... especially on my mirrorserver running apache. On that server changing SLAB allocator to SLUB helped to mitigate the issue altogether.

On another machine with a large MTU interface, I'm still getting allocation failures in similar path, but this time order 5. Haven't found solution for that one yet.

Another thing that partially helps, or rather helps reducing the frequency a little is doing frequent memory compaction (echo 1 > /proc/sys/vm/compact_memory run every minute from cron).

Another thing worth looking at is how your application works with memory - ie. how allocates and frees it. If there are frequent allocations and deallocations it may be worth trying to use some kind of memory pool.

The last but not least thing that worth trying is enabling or disabling (transparent) hugepages.

Fox
  • 3,887
  • 16
  • 23
1

The issue here was out-of-date VMware guest drivers (vmware-tools) and a newer OS under load. This is something that gets revised as ESXi updated are released. Out-the-box point releases of VMWare are showing this issue. Updated versions are not.

Of course, there's the question of how to cleanly update your VMware installation...

ewwhite
  • 194,921
  • 91
  • 434
  • 799