0

I have a Java/WebSphere application that is doing XA transactions with a SQL Server 2008 instance. In some environments, everything works as expected. In two of our environments, transactions will intermittently fail.

Some info about the environment:

  • Application server is a Linux VM running WebSphere 8.5.5.3. It is using version 4.0 of the SQL Server JDBC driver. The data source is configured for XA.
  • Database server is a Windows VM running SQL Server 2008
  • Whether or not things work seems to be dependent on the database on the database server. In the broken environment, I can configure the application server to use the database in the broken environment and things no longer work. The inverse is also true - if I configure the application server in the broken environment to use the DB server in the working environment, then the application will work.

On the application server, I see stacktraces like this when the transaction fails:

com.microsoft.sqlserver.jdbc.SQLServerException: Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction.
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
    at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:404)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:350)
    at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
    at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
    at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:314)
    at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecuteUpdate(WSJdbcPreparedStatement.java:1187)
    at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeUpdate(WSJdbcPreparedStatement.java:804)
    at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.executeUpdate(ResultSetReturnImpl.java:186)

When I look at the logs for MSDTC on the database server, I see this:

time=12/22/2015-10:47:48.611    eventid=RM_ENLISTED_IN_TRANSACTION                  tx_guid=690a94a2-a060-4eb9-8966-ef25b0fa001b        resource manager #1001 enlisted as transaction enlistment #1. RM guid = '280f3497-9cc1-4689-b612-7a08cce82e2b'
time=12/22/2015-10:47:57.612    eventid=ABORT_DUE_TO_TRANSACTION_TIMER_EXPIRED      tx_guid=690a94a2-a060-4eb9-8966-ef25b0fa001b        transaction timeout expired
time=12/22/2015-10:47:57.612    eventid=TRANSACTION_ABORTING                        tx_guid=690a94a2-a060-4eb9-8966-ef25b0fa001b        transaction is aborting
time=12/22/2015-10:47:57.612    eventid=RM_ISSUED_ABORT                             tx_guid=690a94a2-a060-4eb9-8966-ef25b0fa001b        abort request issued to resource manager #1001 for transaction enlistment #1
time=12/22/2015-10:47:57.612    eventid=RM_ACKNOWLEDGED_ABORT                       tx_guid=690a94a2-a060-4eb9-8966-ef25b0fa001b        received acknowledgement of abort request from the resource manager #1001 for transaction enlistment #1
time=12/22/2015-10:47:57.612    eventid=TRANSACTION_ABORTED                         tx_guid=690a94a2-a060-4eb9-8966-ef25b0fa001b        transaction has been aborted

I consistently see a ABORT_DUE_TO_TRANSACTION_TIMER_EXPIRED event happening 9 seconds after a RM_ENLISTED_IN_TRANSACTION event. The problem is that the MSDTC transaction timeout on the database server is configured for 60 seconds. I have tried changing this timeout and it does not affect the behaviour at all. I don't see any transaction timeout settings in WebSphere that would match up to the 9 second interval either. Where is this timeout coming from and how can I change it?

Jason B
  • 101
  • 3

1 Answers1

0

We figured out our issue. We had another non-WebSphere application that was using Atomikos for JTA transactions. Atomikos by default sets a 10 second timeout for its transaction by default unless you set the com.atomikos.icatch.default_jta_timeout property. Atomikos sets this timeout on all XAResource instances that participate in the transaction. For some reason, this caused the 10 second timeout to be applied globally to all transactions using that instance of MSDTC. It is also worth noting that WebSphere's transaction manager does NOT call setTimeout on the participating XAResource instances, which may be part of the reason another application was affecting these transactions. Restarting MSDTC seems to clear that global timeout until an Atomikos transaction runs again.

It is probably worth noting that other JTA transaction managers like Bitronix might have the same issues when accessing SQL Server with XA. This problem may also be specific to the SQL Server JDBC Driver version 4.0, and may be fixed in later versions. However, I am not able to test these statements.

If you should happen to have the same problem, you should be able to verify it using these steps:

  1. Restart MSDTC
  2. Run transactions in your WebSphere application. They should never fail due to timeouts. Try multiple transactions - enough to feel comfortable that things work.
  3. Run a transaction with your Atomikos application.
  4. Run a transaction with WebSphere again. If the transaction exceeds 10 seconds, it should now fail and be aborted by MSDTC.
  5. Restart MSDTC
  6. Run a transaction with WebSphere again. It should no longer experience timeout errors even if it goes beyond 10 seconds.
Jason B
  • 101
  • 3