4

This is Tomcat 6.0.18, Java 1.7.0_03 (32-bit), and SLES11 SP2 (64-bit). As for kernel information:

$ uname -a
Linux server-1 3.0.13-0.27-default #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b) x86_64 x86_64 x86_64 GNU/Linux

We were doing a load and longevity test on three servers. On all three separate machines we had Tomcat exit within one second of 2^32 milliseconds (49+ days) of when each Tomcat started up. On each machine two threads produced stack traces before the JVM exits (Tomcat itself calls System.exit(1) when it gets the SocketTimeoutException which is why the JVM exits).

One thread is the one that (by default) listens on port 8005 for the shutdown command (verified that by looking at Tomcat source):

Jun 22, 2012 9:10:15 AM org.apache.catalina.core.StandardServer await
SEVERE: StandardServer.await: accept: 
java.net.SocketTimeoutException: Accept timed out
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
      at java.net.ServerSocket.implAccept(Unknown Source)
      at java.net.ServerSocket.accept(Unknown Source)
      at org.apache.catalina.core.StandardServer.await(StandardServer.java:389)
      at org.apache.catalina.startup.Catalina.await(Catalina.java:642)
      at org.apache.catalina.startup.Catalina.start(Catalina.java:602)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
      at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

The other thread is (we believe, though we didn't check Tomcat source to verify) the one that handles incoming port 8080 connections:

Jun 22, 2012 9:10:15 AM org.apache.jk.common.ChannelSocket acceptConnections
WARNING: Exception executing accept
java.net.SocketTimeoutException: Accept timed out
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
      at java.net.ServerSocket.implAccept(Unknown Source)
      at java.net.ServerSocket.accept(Unknown Source)
      at org.apache.jk.common.ChannelSocket.accept(ChannelSocket.java:307)
      at org.apache.jk.common.ChannelSocket.acceptConnections(ChannelSocket.java:661)
      at org.apache.jk.common.ChannelSocket$SocketAcceptor.runIt(ChannelSocket.java:872)
      at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
      at java.lang.Thread.run(Unknown Source)

Tomcat isn't doing anything wild. In the first case it's just a while (true) loop that gets a Socket by calling ServerSocket.accept() and the accept() call bombs out.

Any ideas why this is happening and what I could try to look at/for to figure out how to prevent it in the future?

Note that while Tomcat was running for 2^32 milliseconds, the system had already been up when Tomcat was started. Of course that doesn't rule out some process variable that was created when Tomcat started being involved.

QuantumMechanic
  • 645
  • 6
  • 15
  • It appears [Tomcat 6.0.18](http://archive.apache.org/dist/tomcat/tomcat-6/) was released in 2008, well before Java 1.7, although there are some socket [incompatibilities b/w Java 1.6 & 1.7](http://www.oracle.com/technetwork/java/javase/compatibility-417013.html) nothing jumps out to explain this. – George3 Jun 29 '12 at 01:32

2 Answers2

4

I've recently seen this problem as well and it appears to be isolated to a change made in the 32-bit Oracle JVM between Java 6 and 7. On Linux, running the 32-bit Java 7 VM with strace shows the following system call when ServerSocket.accept() is invoked without having set SO_TIMEOUT:

32369 poll([{fd=5, events=POLLIN|POLLERR}], 1, 4294967295 <unfinished ...>

The call to poll() passes a timeout value of 2^32 milliseconds (4294967295), rather than the expected negative value which would indicate an infinite timeout. This eventually causes ServerSocket.accept() to throw a SocketTimeoutException, which causes Tomcat's bootstrap code to perform a shutdown of the server. That particular piece of Tomcat never expects a SocketTimeoutException to be thrown by ServerSocket.accept.

It is easier to reproduce this problem if the call to poll() can be manipulated so that you don't have to wait 2^32 milliseconds. This can be done in Linux by overriding the poll system call. One method for doing this involves using the LD_PRELOAD directive to load an overridden version of poll. Some example code that shows this idea can be found at https://github.com/vi/timeskew. Unfortunately, it does not override poll, but can be readily extended to do so.

mike
  • 56
  • 1
3

I was experiencing the same problem using Tomcat 7.0.35 with Java 1.7.0_10 (32bit) on a 64-bit Debian Linux.

In my case updating and using the 64bit JDK solved this problem

stefan
  • 31
  • 2
  • Updating to 64-bit Java fixed the problem for us as well. So I assume that somewhere in the C/C++ guts of Java it's using an `int` instead of a `long` to track something. – QuantumMechanic Jan 29 '14 at 16:24