We run two separate tomcat-servers (tomcat 7) behind a load-balancer and an Apache (v. 2.2) working as proxy (on the same host). Every server runs about 70 microservices which are accessed from a remote Liferay system and another server-cluster containing services with business-logic.
The problem: About every one to two days, the apache that is accepting the requests from the load-balancer to pass it on to the tomcat suddenly seems not to be able to connect and is hence spawning workers until it reaches it's limit. The tomcat then is completely unreachable.
The error-message in the httpd/error.log looks like this (repeating many times):
[.... 2018] [error] (70007)The timeout specified has expired:
ajp_ilink_receive() can't receive header
We also observed several hundred TCP connections in the status CLOSE-WAIT. Strangely we do not observe any error-message in the catalina.out. The CPU and memory-usage of the servers is within save limits.
We think the problem might be related to one of the microservices not handling connections correctly (as we observed in the thread-dump many threads using the HttpClient in the status WAITING).
Anyhow, the question remains if this could also be a configuration problem of the Tomcat or Apache, or if it is possible to change the configuration somehow to make the server more resilient.
Any suggestions? Relevant configuration information below:
tomcat server.xml
<Connector port="8080"
protocol="HTTP/1.1"
connectionTimeout="20000"
compression="on"
noCompressionUserAgents="gozilla, traviata"
compressableMimeType="text/html,text/xml,text/css,text/javascript,text/plain"
redirectPort="8443"
URIEncoding="UTF-8"
maxThreads="200"
acceptCount="800"
maxHttpHeaderSize="8192"
maxKeepAliveRequests="-1" />
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443"/>
apache httpd.conf
Timeout 60
RLimitCPU max
RLimitMEM max
KeepAlive Off
MaxKeepAliveRequests 100
KeepAliveTimeout 5
<IfModule prefork.c>
StartServers 16
MinSpareServers 8
MaxSpareServers 20
ServerLimit 350
MaxClients 350
MaxRequestsPerChild 4000
</IfModule>
apache vhosts.conf
ProxyPass / ajp://localhost:8009/ keepalive=on retry=0 timeout=60 ping=120
connectiontimeout=120