0

We have an application running on WebSphere (unfortunately it is 6.1 which is no longer supported, it has not yet been migrated in production to a later version) which becomes entirely unresponsive because of hung threads.

As far as I can tell we entirely exhaust one of the thread pools.

I have activated hung thread detection and I get a core/thread dump when hung threads are detected.

The server can run for several days without problems but has crashed twice this week.

When load the core/thread dump in "IBM Thread and Monitor Dump Analyzer for Java", it tells me that there are a certain number of hung threads (this time it was 2, last time 11) and multiple (usually around 40) threads "waiting on condition" and some running threads. I believe one of the thread pool has around that size (50).

Now what I see in there are threads waiting for locks, having locks or in wait.

Most of them show a stack track which always ends like this:

at java/lang/Object.wait(Native Method) at java/lang/Object.wait(Object.java:231)

Now, how can I track this down to either a server configuration problem, application issue, WebSphere problem or something else? How is this supposed to help me track down the problem when almost everything in there refers to IBM code?

I cannot ask IBM's help as 6.1 is now an unsupported version of WebSphere and while work has been done to make it work under WebSphere 7 we are not yet ready to switch to it in Production yet.

These are the stack trace of the threads which are blocked:

HAManager.thread.pool : 0
Blocked
Waiting for Monitor Lock on java/lang/Object@12648F78/12648F84

at java/lang/Object.wait(Native Method) 
at java/lang/Object.wait(Object.java:231) 
at com/ibm/ws/util/BoundedBuffer.waitGet_(BoundedBuffer.java:203) 
at com/ibm/ws/util/BoundedBuffer.take(BoundedBuffer.java:564) 
at com/ibm/ws/util/ThreadPool.getTask(ThreadPool.java:869) 
at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1617)

WebContainer : 5
Blocked
Waiting for Monitor Lock on com/ibm/ejs/ras/TraceLogger@11265878/11265884

at com/ibm/ejs/ras/TraceLogger.doLog(TraceLogger.java:299) 
at com/ibm/ejs/ras/TraceLogger.processEvent(TraceLogger.java:286) 
at com/ibm/ws/logging/WsHandlerWrapper.publish(WsHandlerWrapper.java:43) 
at java/util/logging/Logger.log(Logger.java:526) 
at com/ibm/ejs/ras/Tr.logToJSR47Logger(Tr.java:1625) 
at com/ibm/ejs/ras/Tr.fireEvent(Tr.java:1587) 
at com/ibm/ejs/ras/Tr.fireTraceEvent(Tr.java:1509) 
at com/ibm/ejs/ras/Tr.debug(Tr.java:610) 
at com/ibm/ws/rsadapter/spi/WSRdbManagedConnectionImpl.inGlobalTransaction(WSRdbManagedConnectionImpl.java:1045) 
at com/ibm/ws/rsadapter/jdbc/WSJdbcConnection.commit(WSJdbcConnection.java:961)
...

The threads "waiting on condition" are a mix of:

at java/lang/Thread.sleep(Native Method) 
at java/lang/Thread.sleep(Thread.java:941)
....

and lots of

at java/lang/Object.wait(Native Method) 
at java/lang/Object.wait(Object.java:231)
...

I have a pretty peculiar one, why is it waiting on this?:

at java/util/Calendar.setFieldsComputed(Calendar.java:1402) 
at java/util/Calendar.complete(Calendar.java:1349) 
at java/util/Calendar.get(Calendar.java:1120) 
at java/text/SimpleDateFormat.subFormat(SimpleDateFormat.java:930) 
at java/text/SimpleDateFormat.format(SimpleDateFormat.java:837) 
at java/text/SimpleDateFormat.format(SimpleDateFormat.java:809) 
at com/ibm/ws/logging/TraceLogFormatter.formatTimeAndThread(TraceLogFormatter.java:125) 
at com/ibm/ws/logging/TraceLogFormatter.formatHeaderBasic(TraceLogFormatter.java:223) 
at com/ibm/ws/logging/TraceLogFormatter.formatBasic(TraceLogFormatter.java:439) 
at com/ibm/ejs/ras/TraceLogger.doLog(TraceLogger.java:295) 
at com/ibm/ejs/ras/TraceLogger.processEvent(TraceLogger.java:286) 
at com/ibm/ws/logging/WsHandlerWrapper.publish(WsHandlerWrapper.java:43)
...

I also have this:

WebContainer : 6

Owns Monitor Lock on com/ibm/ejs/ras/TraceLogger@11265878/11265884

at sun/io/CharToByteSingleByte.convert(CharToByteSingleByte.java:329) 
at com/ibm/misc/IOConverter.convert(IOConverter.java:111) 
at sun/io/CharToByteConverter.convertAny(CharToByteConverter.java:191) 
at sun/nio/cs/StreamEncoder$ConverterSE.implWrite(StreamEncoder.java:259) 
at sun/nio/cs/StreamEncoder.write(StreamEncoder.java:186) 
at java/io/OutputStreamWriter.write(OutputStreamWriter.java:214) 
at java/io/BufferedWriter.flushBuffer(BufferedWriter.java:132) 
at java/io/PrintStream.write(PrintStream.java:492) 
at java/io/PrintStream.print(PrintStream.java:636) 
at java/io/PrintStream.println(PrintStream.java:773) 
at com/ibm/ejs/ras/TraceLogger.doLog(TraceLogger.java:300) 
at com/ibm/ejs/ras/TraceLogger.processEvent(TraceLogger.java:286) 
at com/ibm/ws/logging/WsHandlerWrapper.publish(WsHandlerWrapper.java:43) 
at java/util/logging/Logger.log(Logger.java:526) 
at com/ibm/ejs/ras/Tr.logToJSR47Logger(Tr.java:1625) 
at com/ibm/ejs/ras/Tr.fireEvent(Tr.java:1587) 
at com/ibm/ejs/ras/Tr.fireTraceEvent(Tr.java:1509) 
at com/ibm/ejs/ras/Tr.entry(Tr.java:776) 
at com/ibm/ws/rsadapter/jdbc/WSJdbcConnection.prepareStatement(WSJdbcConnection.java:2056) 
at com/ibm/ws/rsadapter/jdbc/WSJdbcConnection.prepareStatement(WSJdbcConnection.java:2039) 
...

and finally this:

WebContainer : 8

at java/lang/StringBuffer.append(StringBuffer.java:239) 
at com/ibm/ws/logging/TraceLogFormatter.formatTimeAndThread(TraceLogFormatter.java:137) 
at com/ibm/ws/logging/TraceLogFormatter.formatHeaderBasic(TraceLogFormatter.java:223) 
at com/ibm/ws/logging/TraceLogFormatter.formatBasic(TraceLogFormatter.java:439) 
at com/ibm/ejs/ras/TraceLogger.doLog(TraceLogger.java:295) 
at com/ibm/ejs/ras/TraceLogger.processEvent(TraceLogger.java:286) 
at com/ibm/ws/logging/WsHandlerWrapper.publish(WsHandlerWrapper.java:43) 
at java/util/logging/Logger.log(Logger.java:526) 
at com/ibm/ejs/ras/Tr.logToJSR47Logger(Tr.java:1625) 
at com/ibm/ejs/ras/Tr.fireEvent(Tr.java:1587) 
at com/ibm/ejs/ras/Tr.fireTraceEvent(Tr.java:1509) 
at com/ibm/ejs/ras/Tr.exit(Tr.java:796) 
at com/ibm/ejs/j2c/PoolManager.processPoolRequestStats(PoolManager.java:6879) 
at com/ibm/ejs/j2c/PoolManager.reserve(PoolManager.java:2026) 
at com/ibm/ejs/j2c/ConnectionManager.allocateMCWrapper(ConnectionManager.java:940) 
at com/ibm/ejs/j2c/ConnectionManager.allocateConnection(ConnectionManager.java:614) 
at com/ibm/ws/rsadapter/jdbc/WSJdbcDataSource.getConnection(WSJdbcDataSource.java:449) 
at com/ibm/ws/rsadapter/jdbc/WSJdbcDataSource.getConnection(WSJdbcDataSource.java:418) 
...
Puzzled
  • 9
  • 1
  • 4
  • Can you list the stack trace from the hung threads? – trikelef Aug 25 '14 at 06:58
  • I added a summary of them on the original message, thank you for your help! – Puzzled Aug 26 '14 at 14:32
  • Can you check that you have not set any trace on the specific Websphere? – trikelef Aug 26 '14 at 14:35
  • We do have tracing enabled... We have this *=off: com.ibm.ws.webservices.trace.MessageTrace=finest and various request metrics to produce the stats we am asked to produced on these servers. Using this logging was imposed to us... – Puzzled Aug 26 '14 at 16:25
  • I checked another dump we have and this time what stands out are calls to the class loader and jsf/faces related calls, no logging calls to speak of... Logging related issues seemed promising and I even found documented issues about it (like OOM) but the other dump seems to point another way... We did have OOM issues in the past (which was never track down even with IBM's help) with something which seemed to be a 800 something Meg cookie-related structure... – Puzzled Aug 26 '14 at 19:53
  • We had another crash about an hour ago, nothing stands out in that dump... There is only one thread that appears to do logging and nothing seems to implicate the class loader this time... – Puzzled Aug 26 '14 at 21:47

0 Answers0