0

Info:

# of VMs= 4, each with an instance of Tomcat 8.5.* in cluster Apps = 5 war applications- 2 UI applications and 3 Webservices. Java Version = java 1.8.* Configuratin = 2 LTMS and 2 Apache Webservers - 1 LTM on top of tomcat clusters which processes Webservice requests (200K+ a day during weekdays). Another LTM on top of Webserver which is also on top on Tomcat instances which handles the UI application requests (10K+ requests a day). JVM parameters: all default with -Xms3072m -Xmx3072

Tomcat configuration:

 Connector port="xxxx"                 
       protocol="HTTP/1.1"  
           connectionTimeout="3000"  
           enableLookups="false"  
           redirectPort="yyyy"  
           maxThreads="80"   
Connector port="yyyy"  
          protocol="org.apache.coyote.http11.Http11NioProtocol"  
          redirectPort="yyyy"  
          secure="true"    
          scheme="https"  
          clientAuth="false"  
          sslProtocol="TLS"   
          sslEnabledProtocols="SSLv2Hello,TLSv1,TLSv1.1,TLSv1.2"  
          SSLEnabled="true"  
          maxThreads="70"  
          maxKeepAliveRequests="100"  
          keepAliveTimeout="5000"  
          connectionTimeout="10000"  
          keystoreFile="....."  
          keyPass="..."  
          keystorePass="..."  
          keyAlias="....."  
          truststoreFile="..."  
          truststorePass="..."  
          ciphers="......."        
   Connector port="zzzz"
     scheme="https"     
     protocol="AJP/1.3"     
     redirectPort="yyyy"     

Issue: We are having to recycle tomcat every once a week, and we do it during the weekend. IF not recycled, on the 7th or 8th day of uptime, minor GC time goes up to anywhere from 5 sec to 30 sec, if still not restarted then whenever major GC happens it takes a minute atleast causing several failed transactions. Up on checking the VM status on all 4 nodes, during this time, we see a lot of swapping happening. Memory utilization is under 55% the whole time, also cpu utilization is below 25%. Surprisingly, this happens during the weekend when there is very little to no load. We have not ever seen any OOM errors, so far it appears that heap tuning is a non-issue (I may be wrong). We also have same configuration in a production simulation environment where the load is not as much as production servers, and there is no such swapping/GC problems in that environment. Any insight or any advice on this would be great help. Please let me know if any other info is needed.

prashma
  • 1
  • 3

1 Answers1

0

Swapping is due to inactive memory pages, objects that not used for a while but not subject to GC, try decrease system swapiness to 10%.

Also check code cache utilization before recycling tomcat it may cause slowness with time.

Try to change GC to g1gc and check, it should have less STW pauses.

hoshoh
  • 56
  • 1
  • I have swappiness at 1. Code cache utilization is around 40%. I don't think changing the GC to g1gc would do anything, because while the system is not swapping the GCs are completely fine. Minor GC takes less than 100ms and Full GC only happens once which takes less than 1 sec. As per my understanding, GC going haywire because heap is being swapped. – prashma May 08 '18 at 17:09
  • While I was still researching this I noticed one thing, swap utilization and memory utilization go crazy approximately around every Saturday evening PST time ( midnight UTC), and that is when our cronjob for log rotation starts. Compressing and archiving the logs which are over 7GB in size, do you think this might be eating the memory? – prashma May 08 '18 at 17:11
  • It not related to memory swap because if it was oom will kick in and kill java, it is possible that the cpu is the issue since compression is cpu-intensive task. Check cpu during GC peak and try lower log rotat task priority. – hoshoh May 08 '18 at 18:26
  • I understand compression could be cpu intensive task, but could it cache the data read by the tar command? And since it is taking few minutes to compress all the logs, during that process the system might kick in swapping? Sorry, I am not arguing, just brainstorming, to see if that is a possibility. – prashma May 08 '18 at 21:43
  • The system will use only 1% of swap, as swapiness is set to 1, that mean that swapping would not be problem (please submit swap usage and space). System will favor hot file cache over lru inactive anonymous pages this could be the JVM heap.I need syslogs to verify. – hoshoh May 08 '18 at 21:52
  • Correct me if I am wrong, I believe swappiness value of 1 means that the system will only swap if 99% of memory is exhausted. I have 9 GB of swap in the system, and during the weekend, as much as 5 GB of swap is being used. – prashma May 08 '18 at 23:01
  • No not really, the system will swap even there a lot of free memory. Swapiness control the percentage of swap to be used in your case it should not have more than 900 mb swap used.You need to check vm.swappiness in /etc/sysctl.conf, 5gb is not 1%. – hoshoh May 08 '18 at 23:10