1

We run a cluster of Java Spring server apps on AWS EC2 instances running Centos 7. We have health monitors on them, and occasionally an alarm will go off and we'll find that the Java process has quietly just disappeared. We can find nothing in any of the logs...either our own, or system logs. We have an outer "catch Throwable" around our own code that logs what it catches, but we run Tomcat, which has may of its own threads. We've added extra logging to try to capture the moment when it disappears, but so far, that has yielded no information.

I've looked over this question: How to find out why a Java process died without a trace in Linux. I see nothing helpful there.

We currently can't involve the launcher of these processes in a solution. It's a long story. Trust me that we've tried to go down that road.

Any suggestions? I'm wondering if maybe I should wrap the Java process in an outer parent process that carefully monitors and logs all signals from the Java child process. I'm wondering if there's such an off-the-shelf solution that I haven't found yet. Any ideas would be greatly appreciated.

CryptoFool
  • 111
  • 4
  • How exactly are you starting these apps? – Michael Hampton Aug 23 '21 at 19:51
  • We're using Chef Habitat, but we're in the middle of switching to something else and we don't want to touch its setup. It was so bad at process management even though it proposes to do so that we disabled all of its process management features. I don't want to go there. I can stop the official running server and then run my own version manually or via another process manager if necessary. I don't know if such a setup would exhibit the same problem. If not, we'd at least then be more suspicious of Habitat itself. – CryptoFool Aug 23 '21 at 20:05
  • I have considered looking into what systemd can do for me. At first glance, that seemed complicated and not necessarily helpful. I know there are other process managers out there. I'm hoping to find one meant for debugging and/or troubleshooting situations like mine. I'm a programmer, not a sys admin, so I'm pretty virgin to all of this. – CryptoFool Aug 23 '21 at 20:08
  • https://docs.spring.io/spring-boot/docs/current/reference/html/deployment.html#deployment.installing.nix-services.system-d – Michael Hampton Aug 23 '21 at 20:43
  • @MichaelHampton - thanks, but my question isn't how to install a service under systemd. I know how to do that. The question is if by letting systemd manage the lifetime of my app/service, it can give me some sort of indication of why my app died that I'm not going to get from existing sources. If it can do this, what sort of configuration do I need to perform to get the most possible information out of systemd when my app disappears? – CryptoFool Aug 23 '21 at 21:55

0 Answers0