1

Me and my team have been struggling to keep a clustered ColdFusion application stable for the better part of the last 6 months with little result. We are turning to SF in the hope of some finding some JRun experts or fresh ideas cause we can't seem to figure it out.

The setup:
Two ColdFusion 7.0.2 instances clustered with JRun 4 (w/ the latest update) on IIS 6 under Windows Server 2003. Two quad core CPUs, 8GB RAM.

The issue:
Every now and again, usually once a week one of the instance will stop handling request completely. There is no activity on it what so ever and we have to restart it.

What we know:
Every time this happen JRun's error log is always full of java.lang.OutOfMemoryError: unable to create new native thread.

After reading JRun documentation from Macromedia/Adobe and many confusing blog posts we've more or less narrowed it down to incorrect/unoptimized JRun thread pool settings in the instance's jrun.xml.

Relevant part of our jrun.xml:

<service class="jrun.servlet.jrpp.JRunProxyService" name="ProxyService">
    <attribute name="activeHandlerThreads">500</attribute>
    <attribute name="backlog">500</attribute>
    <attribute name="deactivated">false</attribute>
    <attribute name="interface">*</attribute>
    <attribute name="maxHandlerThreads">1000</attribute>
    <attribute name="minHandlerThreads">1</attribute>
    <attribute name="port">51003</attribute>
    <attribute name="threadWaitTimeout">300</attribute>
    <attribute name="timeout">300</attribute>
{snip}  
</service>

I've enabled JRun's metrics logging last week to collect data related to threads. This is a summary of the data after letting it log for a week.

Average values:

{jrpp.listenTh}       1
{jrpp.idleTh}         9
{jrpp.delayTh}        0
{jrpp.busyTh}         0
{jrpp.totalTh}       10
{jrpp.delayRq}        0
{jrpp.droppedRq}      0
{jrpp.handledRq}      4
{jrpp.handledMs}   6036
{jrpp.delayMs}        0
{freeMemory}      48667
{totalMemory}    403598
{sessions}          737
{sessionsInMem}     737

Maximum values:

{jrpp.listenTh}       10
{jrpp.idleTh}         94
{jrpp.delayTh}         1
{jrpp.busyTh}         39
{jrpp.totalTh}       100
{jrpp.delayRq}         0
{jrpp.droppedRq}       0
{jrpp.handledRq}      87
{jrpp.handledMs}  508845
{jrpp.delayMs}         0
{freeMemory}      169313
{totalMemory}     578432
{sessions}          2297
{sessionsInMem}     2297

Any ideas as to what we could try now?

Cheers!


EDIT #1 -> Things I forgot to mention: Windows Server 2003 Enterprise w/ JVM 1.4.2 (for JRun)

The max heap size is around 1.4GB yeah. We used to have leaks but we fixed them, now the application use around 400MB, rarely more. The max heap size is set to 1200MB so we aren't reaching it. When we did have leaks the JVM would just blow up and the instance would restart itself. This isn't happening now, it simply stops handling incoming request.

We were thinking it has to do with thread following this blog post: http://www.talkingtree.com/blog/index.cfm/2005/3/11/NewNativeThread

The Java exception being thrown is of type OutOfMemory but it's not actually saying that we ran out of heap space, just that it couldn't create new threads. The exception type is a bit misleading.

Basically the blog is saying that 500 as activeHandlerThreads might be too high but my metrics seems to show that we get no where near that which is confusing us.

jfrobishow
  • 71
  • 10

2 Answers2

3

Well, let's look at some bigger picture issues before getting into JRun configuration details.

If you're getting java.lang.OutOfMemoryError exceptions in the JRun error log, well, you're out of memory. No upvote for that, please ;-). You didn't say whether you were running 32- or 64-bit Windows, but you did say that you have 8 GB of RAM, so that will have some impact on an answer. Whether or not you're running a 32- or 64-bit JVM (and what version) will also impact things. So those are a few answers that will help us get to the bottom of this.

Regardless, your application IS running out of memory. It's running out of memory for one or more of these reasons:

  1. Your application is leaking memory. Some object your application uses is continually referenced and therefore never eligible for garbage collection; or worse- some object created new on every request is referenced by another object in perpetuity and therefore, never eligible for garbage collection. Correct J2EE session handling can be particularly tricky in this regard.
  2. The amount of required memory to handle each concurrent request (at the configured concurrent request level) exceeds the amount of memory available in the JVM heap. For instance, you have a heap size of 1 GB and each request can use up to 10 MB. Your app server is tuned to allow 150 concurrent requests. (Simplistic numbers, I know). In that case, you would definitely be running out of memory if you experienced 100 or more concurrent requests under load (if each request used the maximum amount of memory necessary to fulfill the request).

Other things to keep in mind: on 32-bit Windows, a 32-bit JVM can only allocate approximately 1.4 GB of memory. I don't recall off the top of my head if a 32-bit JVM on 64-bit Windows has a limitation less than the theoretical 4 GB max for any 32-bit process.

UPDATED

I read the blog post linked via TalkingTree and the other post linked within that post as well. I haven't run into this exact case, but I did have the following observation: the JRUN metrics logging may not record the "max values" you cited in a period of peak thread usage. I think it logs metrics at a fixed, recurring interval. That's good for showing you smooth, average performance characteristics of your application, but it may not capture JRUN's state right before your error condition begins to occur.

Without knowing about the internal workings of JRUN's thread management, I still say that it really is out of memory. Perhaps it's not out of memory because your app needed to allocate memory on the JVM heap and none was available, but it's out of memory because JRUN tried to create another thread to handle an incoming request and the heap memory necessary to support another thread wasn't available- in other words, threads aren't free- they require heap memory as well.

Your options seem to be as follows:

  1. Reduce the amount of memory your application uses in each request, or-
  2. Experimentally reduce the value of the thread tuning parameters in JRUN's configuration to make more threads queue up for processing instead of becoming runnable at the same time, or-
  3. Reduce the number of simultaneous requests in the ColdFusion administrator (Request Tuning page, field "Maximum number of simultaneous Template requests")

Regardless of the option you pursue, I think a valid fix here is going to be experimental in nature. You're going to have to make a change and see what affect it has on the application. You have a load testing environment, right?

Clint Miller
  • 1,141
  • 1
  • 11
  • 19
  • Both of your answers led me in what seems to be the right way. We've pretty much narrowed it down to insufficient memory on the stack itself. Looks like there is no space left for another thread on the stack even though there is plenty on the heap. This would explain why it doesn't recover from the error. We will play around with our jvm.config and optimize the JVM settings. Experimentation is the way to go as you said. Unfortunately I won't have time to test this before the bounty expires so I'll accept your answer for now and will post results once we get time to test your solutions. – jfrobishow Dec 09 '09 at 22:14
  • Long long time after, finally an update. Reducing the request size is what fixed it :). We ended up lowering the heap size anyway cause our application didn't require all of it. Been stable for a while now, no issues. Thanks guys! – jfrobishow Jul 05 '10 at 18:03
  • Glad you found the perfect combination of tuned parameters that worked for you. – Clint Miller Jul 06 '10 at 14:04
0

Try and reduce the maximum heap size. Each thread requires native resources (along with java own stuff). The usable virtual AS is 2GB; 1.2GB is reserved for the heap. Part of the remaining 800MB is used for code (text segments of java and all required DLLs), then there are native allocations required by the JRE and its dependencies... and the threads: for each thread by default 1MB of AS is reserved (though just a page is actually committed), 100 threads = 100MB (just for the stacks). Now add a bit extra space between the various pieces, some fragmentation... OOM ;-)

Luca Tettamanti
  • 846
  • 8
  • 11