3

I'm not the system administrator for this server but I'm trying to help the group who are sys admins to find a fix quickly. For this reason, I don't have access to all of the server configuration files.

With that out of the way, here's my question:

This is regarding a Java EE application on a Solaris machine with Sun Java System Web Server 6.1 and Sun Java System Application Server 8.1. The web server is acting as a proxy for requests going to the application server. My understanding is that it's also set up as a load balancer, though it only points to one application instance so it seems there's nothing to balance.

When making requests through the proxy to the application, we're intermittently seeing a purple/blue page that says the following:

Due to a temporary error the request could not be serviced.

The problem could be because:
    - The server is busy.
    - The server is temporarily unavailable.

 You may choose to resubmit the request, but be aware that the request might 
 have already been processed.  Depending on the type of request, you may not
 want it to be processed twice.  Please click here to re-submit.

A few things to note:

  1. As I mentioned, we see this screen intermittently maybe one request out of every 100-500.
  2. When we do see the screen, it is returned without delay. In other words, it doesn't seem to be timeout related.
  3. Refreshing the page will cause the actual application page being requested to display. In other words, it doesn't seem that there was a temporary server outage in that 1-2 seconds between the error page loading, the refresh, and the real page loading.
  4. I don't think the network is an issue since the web and application servers are on the same host.
  5. The web server logs have the following error when this error page appears: [02/Feb/2009:15:37:32] warning (19614): reports: lb.runtime: ROUT1014: Non-idempotent request /applicationContext cannot be retried. [02/Feb/2009:15:37:32] info (19614): reports: lb.runtime: RNTM3003 : Error servicing the request : selected server could not service

What could be causing this error page to come up?

Thanks, Jeff

Update:

Here's the load balancer configuration:

<!DOCTYPE loadbalancer PUBLIC "-//Sun Microsystems Inc.//DTD Sun ONE Application Server 7.1//EN" "sun-loadbalancer
_1_1.dtd">
<loadbalancer>
    <cluster name="cluster1">
        <instance  name="instance1" enabled="true" disable-timeout-in-minutes="60" listeners="http://host.domain.com:32000"/>
        <web-module context-root="/applicationContext" enabled="true" disable-timeout-in-minutes="60" error-url="sun-http-lberror.html" />
        <health-checker url="/applicationContext" interval-in-seconds="30" timeout-in-seconds="10" />
    </cluster>
    <cluster name="other_cluster">
         <instance  name="other_host" enabled="true" disable-timeout-in-minutes="60" listeners="http://host2.domain.com:80000"/>
         <web-module context-root="/otherContext" enabled="true" disable-timeout-in-minutes="60" error-url="./sun-http-lberror.html" />
         <health-checker url="/otherContext" interval-in-seconds="30" timeout-in-seconds="10" />
    </cluster>
    <property name="reload-poll-interval-in-seconds" value="60"/>
    <property name="response-timeout-in-seconds" value="600"/>
    <property name="https-routing" value="false"/>
    <property name="require-monitor-data" value="false"/>
</loadbalancer>
jlpp
  • 246
  • 5
  • 10
  • Sys admin opened support ticket with Sun yesterday. Not sure how long it will take Sun to research. – jlpp May 19 '09 at 13:18
  • Hey jlpp, did you get a solution on this from Sun? – Luke May 29 '09 at 20:55
  • Sorry for the delay. Nothing from Sun yet, to my knowledge. It's possible that the system admin has gotten something and hasn't relayed it to me. I'll find out soon. – jlpp Jul 15 '09 at 22:48

2 Answers2

1

It would seem that if the proxy is set up to load balance as you say, and there is only one server it can point to, that sometimes the load balance deems the target application server too busy, and gives you that error.

Can you correspond the error you are seeing to any sort of load on the server? Is there a way to take the load balancing out of the equation and test? Can you configure the load balancer settings/view them to see if it has very conservative thresholds?

WerkkreW
  • 5,879
  • 3
  • 23
  • 32
  • No correlation between errors and load. I'll see if the admins can take the load balancer out of the picture. I'll also try to get the full config file. Thanks. – jlpp May 14 '09 at 00:36
  • Added load balancer config to question text. Asked sys admins to try removing load balancer. – jlpp May 14 '09 at 16:00
  • Sys admin tells me that the load balancer is needed for SiteMinder authentication so we can't remove that. – jlpp May 14 '09 at 18:47
1

I would suspect the health check feature is running and deciding that the backend server is unavailable. Maybe try increasing the timeout values in the health-checker configuration or disable it completely. Since there's only one application server to proxy to, this won't really cause any lost functionality.

Luke
  • 628
  • 1
  • 7
  • 14
  • 1
    Thanks Luke. I've asked the sys admin to disable the health checker. Will report back with results. – jlpp May 18 '09 at 12:56
  • 1
    Disabling the health checker *seems* to have led to a situation where if a the app server is genuinely unresponsive, the load balancer/proxy/web server will eventually notice that requests aren't returning and tag the server is unresponsive. It then starts returning the blue/purple error screens. The problem is that when the app server becomes responsive the load balancer still returns the error page. That's the behavior we're seeing now. – jlpp May 19 '09 at 13:16
  • Shoot, well then I'm fresh out of ideas. I see that you've opened a support ticket though - I look forward to seeing what Sun comes back with. – Luke May 19 '09 at 13:50