There is a very acute problem with the distribution of sessions through the balancer in case of recovery of one of the fallen nodes.
I will describe briefly how the interaction of the two modules is arranged:
On the one hand, an application on WebShere Application Server (hereinafter referred to simply as WAS)
On the other hand, it is not known which application
Between them - 2 WebSphere MQ servers
WAS applications connect to WebShere MQ through a balancer
The second side connects directly to the MQ servers (because it can distribute connections by itself)
When a problem occurs:
- One of the MQ servers crashes. At this point, all sessions are distributed by the balancer to one of the remaining MQ servers. The second application also continues to work with only one MQ server. There is no problem.
- It takes some time and the MQ server is restored to work. The second application instantly restores connections to the second MQ server. The balancer continues to keep all sessions on one node, since this JMS and connection pool are quite sufficient (here it is probably worth saying that the application works only through the queue connection factory and without activation specifications). Thus, WAS servers do not read the messages that application 2 places on the server MQ until the connection pool is insufficient and a new session is opened, which is already distributed by the balancer to the second MQ. But this period can be very long and 50% of messages will fall into timeouts (by application 2).
Hence the question.
Is it possible to somehow organize the process so that when restoring the second MQ, part of the sessions on the balancer will either be transferred to the second MQ server, or a new session (instantly) from the WAS side will just be generated (for example, if we start using activation specification maybe) ?