I making a tuning for our production servers for a portal, we have 4 servers, 2 for web and 2 for app, and there is a firewall before and after web servers (so yes there is a firewall between app and web servers) the issue here started from dropping idle connections between app servers and web servers by firewall, tried with a lot of solutions and now seemed that issue moved from stuck broken connections that was in app because dropping from firewall, this issue was happens when I have low load to portal, and to solve it I need to restart all app servers, now I have issue with high load days instead, and urgent solution was simply a quick restart Apache web servers, how to solve this issue.
I made changes by helping of Jboss loadbalancing configuration generator : http://lbconfig.appspot.com/?lb=mod_jk&mjv=1.2.28&nca=64&ncj=64&nai=2&nji=2&njips=6&f=true&c=false&lr=false&lrl=&mpm=Prefork
And monitoring connections in both servers using netstat command and with google analytics Real Time overview, I got the following stats with ~ 40 visitors after 3 days of last restart:
Web side (2 servers but connections her "for each" not total):
ESTABLISHED ~700 - 750
TIME_WAIT: 100-200 (big jumbs for one second 150 another 200 another 170 and then 120 and so)
App Side (here I counted all connections, most of them ESTABLISHED and few CLOSE_WAIT 0 - 5 each time I check):
S1 (4 instances running) : 900-950
S2 (5 instances running) : 1000-1100
Servers details :
- On web 2x servers: Apache 2.2.14 / mod_jk 1.2.37
- on app 2x servers: Clustered Glassfish 2.1.1 with ajp13 (6 instances / each server)
- All servers Solaris SPARC 64 V-CPUs 32GB ram.
My configurations : Mostly like the generator gave me (u can see link) :
httpd.conf:
KeepAlive On
ServerLimit 12800
StartServers 5
MinSpareServers 5
MaxSpareServers 20
MaxClients 12800
MaxRequestsPerChild 5000
ExtendedStatus Off
worker.properties:
worker.maintain=30
worker.template.type=ajp13
worker.template.session_cookie=JSESSIONID
worker.template.lbfactor=1
worker.template.ping_timeout=10000
worker.template.connection_pool_timeout=10
worker.template.socket_keepalive=True
worker.template.socket_timeout=600
worker.template.connect_timeout=10000
worker.template.prepost_timeout=10000
worker.template.connection_ping_interval=20
worker.template.ping_mode=A
worker.template.socket_connect_timeout=600000
From glassfish side time-outs 10 seconds from cluster configuration side, I have:
HTTP service property :
- connectionTimeout= 10000
Request Processing:
- Thread Count: 2133
- Initial Thread Count : 20
- Thread Increment : 10
Keep Alive (enabled):
- Thread Count: 400
- Max Connections 256
- Time out : 10 seconds
Connection Pool:
- Max Pending Count 4096 connections
So:
- So Is my configurations is correct ?
- How to solve high number of established connections or its safe?, I don't want down time again for apache if got high load again.