1

Background

A spring boot Java application is deployed in a kubernetes cluster and gets killed several times per day.

I'm using openjdk:8u181-jre for my Java apps.

Kubernetes version: v1.11.5

Node os: CentOS 7.4 x64

JAVA_OPTS are set according to this post about letting java application read the cgroup limitations. https://developers.redhat.com/blog/2017/03/14/java-inside-docker/

env:
- name: JAVA_OPTS
  value: " -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Xms512M"
resources:
  requests:
    memory: "4096Mi"
    cpu: "1"
  limits:
    memory: "4096Mi"
    cpu: "1"

The nodes in the cluster are with memory of 16GiB. And the pod is requesting 4GiB.

Error

But the application get OOM killed from time to time.

The system events:

Jan 16 23:29:58 localhost kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
Jan 16 23:29:58 localhost kernel: java cpuset=docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope mems_allowed=0
Jan 16 23:29:58 localhost kernel: CPU: 7 PID: 19904 Comm: java Tainted: G           OE  ------------ T 3.10.0-693.2.2.el7.x86_64 #1
Jan 16 23:29:58 localhost kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Jan 16 23:29:58 localhost kernel: ffff880362700000 000000008b5adefc ffff88034078bc90 ffffffff816a3db1
Jan 16 23:29:58 localhost kernel: ffff88034078bd20 ffffffff8169f1a6 ffff8803b1642680 0000000000000001
Jan 16 23:29:58 localhost kernel: 0000000000000000 ffff880407eeaad0 ffff88034078bcd0 0000000000000046
Jan 16 23:29:58 localhost kernel: Call Trace:
Jan 16 23:29:58 localhost kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Jan 16 23:29:58 localhost kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229
Jan 16 23:29:58 localhost kernel: [<ffffffff81185ee6>] ? find_lock_task_mm+0x56/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
Jan 16 23:29:58 localhost kernel: [<ffffffff811f52a6>] mem_cgroup_oom_synchronize+0x546/0x570
Jan 16 23:29:58 localhost kernel: [<ffffffff811f4720>] ? mem_cgroup_charge_common+0xc0/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186c24>] pagefault_out_of_memory+0x14/0x90
Jan 16 23:29:58 localhost kernel: [<ffffffff8169d56e>] mm_fault_error+0x68/0x12b
Jan 16 23:29:58 localhost kernel: [<ffffffff816b0231>] __do_page_fault+0x391/0x450
Jan 16 23:29:58 localhost kernel: [<ffffffff810295da>] ? __switch_to+0x15a/0x510
Jan 16 23:29:58 localhost kernel: [<ffffffff816b03d6>] trace_do_page_fault+0x56/0x150
Jan 16 23:29:58 localhost kernel: [<ffffffff816afa6a>] do_async_page_fault+0x1a/0xd0
Jan 16 23:29:58 localhost kernel: [<ffffffff816ac578>] async_page_fault+0x28/0x30
Jan 16 23:29:58 localhost kernel: Task in /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope killed as a result of limit of /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice
Jan 16 23:29:58 localhost kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 7722
Jan 16 23:29:58 localhost kernel: memory+swap: usage 4194304kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-58ff049ead2b1713e8a6c736b4637b64f8b6b5c9d1232101792b4d1e8cf03d6a.scope: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope: cache:32KB rss:4194232KB rss_huge:3786752KB mapped_file:8KB swap:0KB inactive_anon:0KB active_anon:4194232KB inactive_file:0KB active_file:32KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Jan 16 23:29:58 localhost kernel: [19357]     0 19357      254        1       4        0          -998 pause
Jan 16 23:29:58 localhost kernel: [19485]     0 19485     1071      161       7        0          -998 sh
Jan 16 23:29:58 localhost kernel: [19497]     0 19497  2008713  1051013    2203        0          -998 java
Jan 16 23:29:58 localhost kernel: Memory cgroup out of memory: Kill process 31404 (java) score 6 or sacrifice child
Jan 16 23:29:58 localhost kernel: Killed process 19497 (java) total-vm:8034852kB, anon-rss:4188424kB, file-rss:15628kB, shmem-rss:0kB

I'm quite confused that the Heap size should be limited to 2GiB (estimated) since the RAMFraction is set to 2. But the container is killed. :(

Could you please help me to find out a correct way or method to dig into this error?

leoleozhu
  • 113
  • 5
  • Can you try with old method Xms. Or upgrade java versions and try to use new parameter MaxRAMPercentage. – Sunil Bhoi Jan 17 '19 at 07:00
  • I was using the old method. Also got killed. Do you think it possible that other memory usage, e.g. stack leak? But I'm not seeing too many threads. – leoleozhu Jan 18 '19 at 01:30

1 Answers1

0

The article you mention is was updated twice with some new information. There are many similar blog posts out there and maybe tools like the java-buildpack-memory-calculator are still helpful. But the conclusion in general is that Java 10 and following will finally be better suited for running in containers.

webwurst
  • 362
  • 2
  • 6