0

I am trying to configure a Glassfish cluster following the official HA guide.

The app is a standard JSF app (with Primefaces). At first I thought that the problem was related to JSF itself, but then as soon as I got deeper in the matter, I realized that the problem is likely in the cluster configuration.

In fact if the cluster contains only one node everything works fine. As soon as I add another node, despite the log message shows nothing wrong is happening, a new JSESSIONID is created on each request.

This is the log that confirm that the instances are correctly seeing each others:

Instance 1:

[2020-04-16T09:11:43.201+0000] [glassfish 5.1] [INFO] [view.window.view.change] [ShoalLogger] [tid: _ThreadID=37 _ThreadName=GMS ViewWindowThread Group-my-cluster] [timeMillis: 1587028303201] [levelValue: 800] [[
  GMS1092: GMS View Change Received for group: my-cluster : Members in view for ADD_EVENT(before change analysis) are :
1: MemberId: instance1, MemberType: CORE, Address: 10.0.20.9:9090:230.30.1.1:9090:my-cluster:instance1
2: MemberId: instance2, MemberType: CORE, Address: 10.0.10.14:9090:230.30.1.1:9090:my-cluster:instance2
3: MemberId: server, MemberType: SPECTATOR, Address: 10.0.10.4:9090:230.30.1.1:9090:my-cluster:server
]]

Instance 2

[2020-04-16T09:11:43.136+0000] [glassfish 5.1] [INFO] [view.window.view.change] [ShoalLogger] [tid: _ThreadID=45 _ThreadName=GMS ViewWindowThread Group-my-cluster] [timeMillis: 1
587028303136] [levelValue: 800] [[
  GMS1092: GMS View Change Received for group: my-cluster : Members in view for ADD_EVENT(before change analysis) are :
1: MemberId: instance1, MemberType: CORE, Address: 10.0.20.9:9090:230.30.1.1:9090:my-cluster:instance1
2: MemberId: instance2, MemberType: CORE, Address: 10.0.10.14:9090:230.30.1.1:9090:my-cluster:instance2
3: MemberId: server, MemberType: SPECTATOR, Address: 10.0.10.4:9090:230.30.1.1:9090:my-cluster:server
]]

Also from DAS log the overall situation seems fine:

[2020-04-16T12:52:59.360+0000] [glassfish 5.1] [FINER] [] [ShoalLogger] [tid: _ThreadID=55 _ThreadName=GMS InDoubtPeerDetector Thread for Group-my-cluster] [timeMillis: 1587041579360] [levelValue: 400] [CLASSNAME: com.sun.enterprise.mgmt.HealthMonitor$InDoubtPeerDetector] [METHODNAME: processCacheUpdate] [[
  processCacheUpdate : instance2 's state is aliveandready]]
......
[2020-04-16T12:52:59.359+0000] [glassfish 5.1] [FINER] [] [ShoalLogger] [tid: _ThreadID=55 _ThreadName=GMS InDoubtPeerDetector Thread for Group-my-cluster] [timeMillis: 1587041579359] [levelValue: 400] [CLASSNAME: com.sun.enterprise.mgmt.HealthMonitor$InDoubtPeerDetector] [METHODNAME: processCacheUpdate] [[
  processCacheUpdate : instance1 's state is aliveandready]]

The web.xml contains the <distributable/> tag and the app is deployed with --availabilityenabled true and I have added <property name="relaxCacheVersionSemantics" value="true"/> to the glassfish-web.xml.

Finally, the cookie is also set correctly and I am verifying the correctness of the cookie in the browser inspector.

<cookie-properties>
    <property name="cookieDomain" value=".mydomain.com" />
    <property name="cookiePath" value="/myapp" />
</cookie-properties>

I have spent almost a week trying to understand what's going on with no luck. All the articles and blog I have read reports to same same resolution steps which I have already applied. I have also increased logging to maximum level but there's no trace of error or similar.

One key factor is that the cluster is on Amazon AWS, and just because I am not sure that multicast is fully supported, I switched the cluster broadcast to TCP by using the GMS_DISCOVERY_LIST. But apparently, as the instances are seeing each others, this settings works.

I have tried both Elastic Load Balancer and Apache HTTP load balancer, both of them with same effect. Also, enabling sticky session on ALB is not working because the balancer sees a different JSESSIONID and therefore redirect to a different node each time.

I am trying to find a way to inspect the session mechanism, but I am not sure what specific logging I have to enable. Simply increasing javax logging results in an unreadable log.

Leonardo
  • 103
  • 5

1 Answers1

0

The only solution was quite drastic: remove glassfish and install latest Payara Server version and the problem disappeared.

Leonardo
  • 103
  • 5