1

We are running RedHat 3.4.6(x32) on VMWareEsx3.5(x64) with 6GB RAM. A few java processes(including jboss) are running in the background.

The problem is that the java processes consume lots of memory, and sometimes they are killed by the OOM-killer. When OOM-killer is about to act, the free physical memory is very low 100MB-200MB, but the swap is not used (99% free). Sometimes this causes a kernel panic too.

  • So why isn't the swap used?
  • How to investigate this kernel panic?
  • Is using 6GB memory on 32bit Redhat wise?

Thanks

Lydon Ch
  • 267
  • 1
  • 2
  • 12

2 Answers2

3

Personally I would never use PAE (over 4G of RAM on a 32-bit system). You'll get much better mileage running an actual 64-bit kernel and system.

OOM should only trigger when a malloc might fail. (not when you have lots of swap available)

The 32-bit kernel is likely to be part of the cause. PAE uses different memory zones and it may be that one zone isn't allowed to malloc from another.

Have you modified your swappiness? (How readily the kernel will use swap.) cat /proc/sys/vm/swappiness ?

You might also investigate tuning vm.dirty_ratio or vm.lower_zone_protection = 100.

Have you captured the kernel panic? (a serial console is often a good way to do this)

You can also try to preempt the OOM-Killer with your own process monitoring software. (take a look at Monit)

Best of luck

Joel K
  • 5,765
  • 2
  • 29
  • 34
  • I increased the swappiness to 100, and the swap is still not used. Havent modified vm.dirty_ratio. Will try to capture the kernel panic. – Lydon Ch Jan 21 '10 at 06:59
1

From http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002704

RHEL4 virtual machines running Oracle/Java randomly kill processes by OOM killer Details OOM killer kills applications even though ESX is not under memory load. The command top shows a lot of memory is being cached and swap is hardly being used. Solution

When the size of the data to be copied exceeds the size of physical memory, oom-killer starts randomly killing processes.

This can be fixed by running:

sysctl -w vm.lower_zone_protection 100

When lower_zone_protection is set to 100, it increases the free page threshold by 100, thereby starting page reclamation earlier and preventing NFS (Network File System) from getting far behind the kernel's memory demands. This causes page reclamation to happen sooner, thus providing more 'protection' for the zones. This issue is identified in RHEL by Redhat and they have provided a workaround for this in the following articles:

Lydon Ch
  • 267
  • 1
  • 2
  • 12