Too many major garbage collections: Add heap space or add another VM?

Question

We are not yet experiencing any application errors but our monitoring tools are indicating that our application is running at the limits of it's resources. Should we first add more heap or add an additional VM?

We have an application running on WebLogic/JRockit in a managed cluster.

We have AppDynamics monitoring this application and it shows that major garbage collections are happening frequently (every 1-2 minutes on average!!!). When a major garbage collection runs it does recover space and the lower range on heap usage is reasonably low, even after the system has been up for a while (weeks/months). Additionally, we ran the AppDynamics collections leak detection against production and it found no leaks. (We couldn't run the custom monitoring because it's not supported with JRockit.) But overall it seems clear that there are no major leaks, just that the system requires more resources than it currently has.

We have two non-production environments also running this application with reduced resources and reduced load (dev and test). The test environment has 2/3rds the number of VMs and 1/2 the heap per VM. We ran some load tests against this environment, but the results were not very helpful. While we can recreate the number of users using automated scripts, the data in our test environment is very different--queries are returning orders of magnitude less data, etc. (Creating a better load testing environment is certainly on the ToDo list, but unlikely to actually happen any time soon for reasons of bureaucracy.) Even with everything we could throw at it, the test environment did not break a sweat.

Two options, A) Add more heap. It seems like this would help for sure, but getting this done will require lots of paperwork (would require adding more memory to the physical servers, which means server restarts involving lots of other applications, etc.). Also, I have no idea how much more memory to add and we cannot just "test in prod". B) Add another VM (or two) for this application. This would be fairly easy, we have space on another physical server, so we could get it done fairly quickly. But I am not sure it would help much, and if it doesn't help then going back to option A later would be even harder.

Specific questions: 1) Is either one of the above options obviously better (and why)? 2) If neither are obviously better, what tests, etc. would I do to decide which is better? 3) How should I decide and justify how many more resources to add (heap or VMs)? (Bonus points here if it involves the tools we already have available.)

Updates:

3 JVMs in a cluster, each JVM is on a separate VM.
They are behind an Apache load balancer, each server gets roughly equal load.
Each JVM has 1 GB heap.
No FMW.

Please describe the topology of your domain and the resources allocated to the master and managed nodes, as well as whether you use FMW or not. It seems to me that adding some more managed nodes to the cluster would be the easiest way to go, not knowing much about the type of application you run, etc... — dawud, Dec 30 '16 at 18:30
In terms of deciding, the main question is whether adding a VM will reduce memory usage on the other VMs. — DerfK, Dec 30 '16 at 19:56
What steps or tests would you recommend to decide if adding a VM will reduce memory usage on other VMs? — user3067860, Dec 30 '16 at 20:03
If the cpu load isn't overwhelming just add more heap, adding more VM's won't make much difference unless you can spread them to different machines and/or you are already limiting the workforce in each managed server through WorkManagers. Note: 1GB isn't that much heap, just double it and move on? :) — ezra-s, Jan 10 '17 at 10:03
We went with "both" and that will hopefully be happening very soon. With luck there will be a slight delay between adding the extra memory and adding the extra servers, so I may be able to tell if just the extra memory is effective, for future reference. — user3067860, Jan 11 '17 at 21:29

score 0 · Answer 1 · answered Dec 30 '16 at 20:04

Assuming that the application has been thoroughly profiled and no memory leaks exist (as it seems to be case), you have to work with the premise that the objects that are being created in the heap are due to the normal activity of the application.

Obviating code optimisations, and/or even more fine tuning of the memory heap based on the size and lifecycle of the objects being created (which in turn is subject to the specific JVM you use), there's not much room for improvement other than adding more managed nodes to your domain.

This can be easily achieved using a tool already present in every WebLogic installation, namely WLST.

It is well documented how to create managed nodes and their respective node managers to an existing cluster using WLST.

score 0 · Accepted Answer · answered Feb 01 '17 at 22:40

We ended up doing both (adding more heap space from 1GB to 1.5GB and adding more managed nodes from 3 nodes to 5).

The heap was increased about an hour before the new nodes were added and was, by itself, enough to significantly reduce the number of garbage collections and time spent in garbage collection.

Adding more nodes caused only a minor improvement, but it's difficult to determine if it really wasn't very helpful, or or if there just wasn't much room for improvement after increasing the heap.

Too many major garbage collections: Add heap space or add another VM?

2 Answers2