I'm running tomcat 9.0.26 on Ubuntu 18.04. Suddenly I noticed that tomcat is down. Never happened before without me shutting it down. I looked at the trace and saw the following (this seems to start the shutdown process):
29-Jan-2020 07:18:27.927 INFO [Thread-4] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["https-jsse-nio2-8443"]
29-Jan-2020 07:18:27.936 INFO [Thread-4] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8009"]
29-Jan-2020 07:18:27.946 INFO [Thread-4] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
Starting here, it's part of the shutdown process itself, not the cause, though maybe related:
29-Jan-2020 07:18:28.145 WARNING [Thread-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [app] appears to have started a thread named [grpc-default-worker-ELG-1-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
this part of the above appears to have started a thread named [grpc-default-worker-ELG-1-1]
changes between exceptions: grpc-default-worker-ELG-1-2
Gax-2
Gax-3
and so on...
After that I see several lines like the following:
29-Jan-2020 07:18:28.154 SEVERE [Thread-4] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [app] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@117138f9]) and a value of type [io.grpc.netty.shaded.io.netty.util.internal.InternalThreadLocalMap] (value [io.grpc.netty.shaded.io.netty.util.internal.InternalThreadLocalMap@2b40bbea]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
and lastly:
29-Jan-2020 07:18:28.262 INFO [Thread-4] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["https-jsse-nio2-8443"]
29-Jan-2020 07:18:28.305 INFO [Thread-4] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8009"]
29-Jan-2020 07:18:28.316 INFO [Thread-4] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["https-jsse-nio2-8443"]
29-Jan-2020 07:18:28.323 INFO [Thread-4] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["http-nio-8009"]
NOTE: Picked up JDK_JAVA_OPTIONS: --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
I'd appreciate any hints on where or what to look for. The shutting down without previous notice makes me wonder what happened there. Thanks
Update: I found this in /var/log/apache2 log, at the same time as the shutdown:
[Wed Jan 29 07:18:29.141852 2020] [mpm_prefork:notice] [pid 568] AH00169: caught SIGTERM, shutting down
[Wed Jan 29 07:38:12.843389 2020] [suexec:notice] [pid 513] AH01232: suEXEC mechanism enabled (wrapper: /usr/lib/apache2/suexec)
[Wed Jan 29 07:38:13.060250 2020] [mpm_prefork:notice] [pid 539] AH00163: Apache/2.4.29 (Ubuntu) mod_fcgid/2.3.9 OpenSSL/1.1.1 configured -- resuming normal operations
[Wed Jan 29 07:38:13.060350 2020] [core:notice] [pid 539] AH00094: Command line: '/usr/sbin/apache2'
In mail.log I see the following (last line's date time is suspicious), postfix is down on purpose):
Jan 29 07:38:08 app postgrey[271]: Process Backgrounded
Jan 29 07:38:08 app postgrey[271]: 2020/01/29-07:38:08 postgrey (type Net::Server::Multiplex) starting! pid(271)
Jan 29 07:38:08 app postgrey[271]: Resolved [localhost]:10023 to [::1]:10023, IPv6
Jan 29 07:38:08 app postgrey[271]: Resolved [localhost]:10023 to [127.0.0.1]:10023, IPv4
Jan 29 07:38:08 app postgrey[271]: Binding to TCP port 10023 on host ::1 with IPv6
Jan 29 07:38:08 app postgrey[271]: Binding to TCP port 10023 on host 127.0.0.1 with IPv4
Jan 29 07:38:08 app postgrey[271]: Setting gid to "122 122"
Jan 29 07:38:09 app postgrey[271]: Setting uid to "118"
Jan 29 07:38:11 app dovecot: master: Dovecot v2.2.33.2 (d6601f4ec) starting up for imap, pop3 (core dumps disabled)
Jan 29 07:38:12 app opendkim[502]: OpenDKIM Filter v2.11.0 starting (args: -P /var/run/opendkim/opendkim.pid -p inet:8891@127.0.0.1)
this is the only time and place that I see a reference to postgrey and I have no idea what it is.
Update 2: this is where things started falling apart:
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#37793 (wWW.MUARaPeNimBunGiLiR.DeSA.Id): query (cache) 'wWW.MUARaPeNimBunGiLiR.DeSA.Id/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#43632 (WWW.mUarapEnImbunGIlIr.dEsA.iD): query (cache) 'WWW.mUarapEnImbunGIlIr.dEsA.iD/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#50618 (www.MuarAPEnImBUNgIliR.DEsA.id): query (cache) 'www.MuarAPEnImBUNgIliR.DEsA.id/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#40648 (WWW.muAraPeniMbungiLir.desA.iD): query (cache) 'WWW.muAraPeniMbungiLir.desA.iD/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#59108 (Www.mUARaPeniMbuNGiLIR.deSA.iD): query (cache) 'Www.mUARaPeniMbuNGiLIR.deSA.iD/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#42231 (wWW.mUARapenIMBUNgIlIr.DESA.ID): query (cache) 'wWW.mUARapenIMBUNgIlIr.DESA.ID/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#43692 (www.MUarAPeNimbUNGIliR.DeSA.ID): query (cache) 'www.MUarAPeNimbUNGIliR.DeSA.ID/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#47881 (Www.mUAraPeNIMBungiLiR.dESA.id): query (cache) 'Www.mUAraPeNIMBungiLiR.dESA.id/A/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#36046 (www.mUaraPeNimBungIliR.Desa.id): query (cache) 'www.mUaraPeNimBungIliR.Desa.id/AAAA/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#49623 (wWw.MuArapenimbUNgILIR.dESA.iD): query (cache) 'wWw.MuArapenimbUNgILIR.dESA.iD/AAAA/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#42436 (Www.muARaPeniMBuNGIliR.DeSa.id): query (cache) 'Www.muARaPeniMBuNGIliR.DeSa.id/AAAA/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#41600 (WWw.mUARAPeNiMbunGiLIr.DEsA.iD): query (cache) 'WWw.mUARAPeNiMbunGiLIr.DEsA.iD/AAAA/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#40947 (WWW.mUAraPeNIMbUNgiLir.DeSA.iD): query (cache) 'WWW.mUAraPeNIMbUNgiLir.DeSA.iD/AAAA/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#47416 (WWW.MUARApENiMBunGilIR.dESA.Id): query (cache) 'WWW.MUARApENiMBunGilIR.dESA.Id/AAAA/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#50269 (Www.mUarAPENImBuNgilir.dESA.id): query (cache) 'Www.mUarAPENImBuNgilir.dESA.id/AAAA/IN' denied
Jan 29 07:13:56 app named[461]: client @0x7f6cec025fa0 35.172.217.55#34727 (WWw.MUArApEniMBUngIliR.DesA.id): query (cache) 'WWw.MUArApEniMBUngIliR.DesA.id/AAAA/IN' denied
then a lot of logs similar to this: Jan 29 07:17:14 app systemd[1]: Started Session c5470 of user app.
Jan 29 07:18:26 app systemd[1]: Removed slice system-serial\x2dgetty.slice.
then a lot of logs similar to this: Jan 29 07:18:26 app systemd[1]: Stopped Session c2888 of user app.
Jan 29 07:18:27 app systemd[875]: Closed GnuPG cryptographic agent and passphrase cache.
Jan 29 07:18:27 app systemd[875]: Reached target Shutdown.
Jan 29 07:18:27 app systemd[875]: Starting Exit the Session...
Jan 29 07:18:27 app systemd[1]: Stopped Session c3432 of user app.
Jan 29 07:18:27 app systemd[875]: Received SIGRTMIN+24 from PID 13436 (kill).
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write '1' to 'kernel/kptr_restrict', ignoring: Permission denied
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write '1' to 'fs/protected_hardlinks', ignoring: Permission denied
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write '1' to 'fs/protected_symlinks', ignoring: Permission denied
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write '176' to 'kernel/sysrq', ignoring: Permission denied
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write '1' to 'net/ipv4/tcp_syncookies', ignoring: No such file or directory
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write '1' to 'kernel/yama/ptrace_scope', ignoring: Permission denied
Jan 29 07:38:06 app keyboard-setup.sh[67]: Couldn't get a file descriptor referring to the console
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write '65536' to 'vm/mmap_min_addr', ignoring: Operation not permitted
Jan 29 07:38:06 app keyboard-setup.sh[67]: Couldn't get a file descriptor referring to the console
Jan 29 07:38:06 app systemd-sysctl[85]: Couldn't write 'fq_codel' to 'net/core/default_qdisc', ignoring: No such file or directory
Jan 29 07:38:06 app keyboard-setup.sh[67]: Couldn't get a file descriptor referring to the console
Jan 29 07:38:06 app keyboard-setup.sh[67]: message repeated 4 times: [ Couldn't get a file descriptor referring to the console]
Jan 29 07:38:06 app keyboard-setup.sh[67]: setupcon: We are not on the console, the console is left unconfigured.
Why do I see 100s of Stopped Session <session> of user...
? and I mean 100s! is it open sessions? is it suppose to be like that? (overload?)
Also this line Jan 29 07:18:27 app systemd[875]: Received SIGRTMIN+24 from PID 13436 (kill).
is VERY suspicious. Will I ever know what was PID 13436?