3

I am trying to identify if behaviour that I am observing is correct or if WildFly is leaking file handle descriptors.

During our standard performance testing after upgrading from WildFly 11 to 14 we ran into an issue regarding too many open files. After digging into things a bit more it looks like it is actually the number of pipes that WildFly has open that is increasing.

To help reproduce the problem I have created a simple JSF 2.2 application that contains a large image (100mb to simplify testing). I am retrieving the image using the standard JSF resource URL:

/contextroot/javax.faces.resource/css/images/big-image.png.xhtml

And have also tried adding omnifaces and using the unmapped resource handler URL:

/contextroot/javax.faces.resource/css/images/big-image.png

Adding Omnifaces did not change the behaviour I am seeing, and I have only included it as we first thought that it might have been a contributing factor.

Behaviour I am seeing:
WildFly starts and jstack reports that it has two threads matching default task-* which is the default value for task-core-threads

If I send in 5 concurrent requests for my large image, 3 new default task-* threads are spawned to serve the requests. 3 new Linux pipes will also be created.

If I stop my requests and wait for 2 minutes (the default value for task-keepalive) 3 of the threads will be removed. The pipes remain open.

Periodically - I believe about every 4.5 minutes some kind of clean up occurs and the pipes that were left over from the step above are removed.

However... If one of the original 2 worker threads is removed, e.g. task-1, task-3 and task-4 are removed, leaving task-2 and task-5, the pipe associated with task-1 is never cleaned up.

Over time these pipes add up and as far as I can tell they are never removed. Is this a leak somewhere, and if so where? JSF? WildFly? Undertow?

Things I have tried:
WildFly 14, 17 and 18
With and without Omnifaces (2.7 and 3.3)
Changing the min and max threads to be the same - this prevents handles building up, but I'd rather not go down this route

2 Answers2

2

I'm facing a kind of this leak, too. Handles are "lost" in triples of two pipes and one epoll selector. (@Gareth: may you confim this? Take a look at /proc/$PID/fd for pipes and anonymous inodes). From this it seems to be spawned by Java NIO Channels.

I discovered that the handles are released (at least) by invoking a Full GC (@Gareth: may you confirm this)? I'm using a well-tuned Java8-JVM with G1GC enabled, and as a enjoyable result a Full GC happens very seldom. But as a negative consequence, it will consume thousands of this FH triples in the meanwhile.

Because the handles are releasable, it's not a real leak but an effect of a Soft/Weak/Phantom-Reference.

And have reached the assigned OS limit (the JVM with Wildfly runs inside a LX-Container) twice last week. Therefore, as a first workaround for production, I wrote a watchdog which invoke a FGC using jcmd if the level of pipe handles rise a limit.

This is watched on a (balanced) pair of Wildfly-13 running about >20 applications. It don't seem to be related to a concrete application because it also happens (on both Wildfly of the pair) if I disable single applications from load balancing (on one of the pair).

It don't "show up" on other (pairs) of our Wildflies, but there is another set of application with other usecases. There is more memory circulation and more "pressure" on the heap. Maybe this will trigger the contemporary release of the objects holding the filehandles in another way.

By taking a look on a heap dump with the "Memory Analyzer Tool", I was able to discover a comparable high and equal number of instances of sun.nio.ch.EPollArrayWrapper and sun.nio.ch.EPollSelectorImpl with inbound references to org.xnio.nio.NioXnio$FinalizableSelectorHolder.

  • Hi @guido, I basically reached the same conclusion as you. The JBoss XNio implementation seems to tie resource clean up to garbage collection which is never a great idea. See here: https://github.com/xnio/xnio/blob/3.x/nio-impl/src/main/java/org/xnio/nio/NioXnio.java for the exact spot that they do it. The rely on the finalize method being called to do the clean up. My ultimate solution was to set wildfly's min and max worker threads to the same (high) value so that the number of threads is static. At least this way the number of open pipes is constant – Gareth Wilson Nov 22 '19 at 00:41
  • @GarethWilson: Found this on JBoss-Forum, too. But I can't login there at the moment due to the signin-migration to RedHat. Therefore, I add an account here to answer. May you tell me the configuration options that correspond to this min/max settings and a rough formula to calculate the number of fh's from this - is it "3xthreads". Does this really stop the growing? Currently, I limit "task-max-threads" => "256" and from the default "io-threads" => "80". But the watchdog continues to fire about a couple of hourly at a total pipe limit of 2500. – Guido Jäkel Nov 28 '19 at 08:09
  • The two settings that I changed were task-max-threads and core-pool-size. The core-pool-zise is the minimum number of threads that should be kept in the pool. By setting these values to the same number the thread pool should remain static – Gareth Wilson Dec 03 '19 at 22:00
  • Thanks you! I guess I'll put a nightly `jcmd jboss GC.run` in crontab. – lapo Oct 14 '21 at 07:40
0

Have started encountering this issue after migrating the WildFly runtime from Java from 8 to 11.

Java 11 changed the default GC algorithm to G1. In G1 old generation objects are collected very selectively and only once a certain heap occupancy threshold is passed. If you have few objects that get promoted to old generation and which help reaching this threshold, it's possible the org.xnio.nio.NioXnio$FinalizableSelectorHolder pile-up there for very long periods, retaining the open file descriptors.

In my case switching garbage collection to use concurrent mark and sweep solved the problem, though I'm fairly certain G1 can be tuned to collect the old generation more aggressively. -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent are the switches to play with. Another approach might actually be reducing the size of old generation heap (–XX:NewRatio) or of the entire heap.