1

How do you diagnose RabbitMQ crash issues on Ubuntu 16?

When I run sudo service rabbitmq-server status it reports:

● rabbitmq-server.service - RabbitMQ Messaging Server
   Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Wed 2018-03-21 19:44:18 UTC; 19min ago
  Process: 1100 ExecStartPost=/usr/lib/rabbitmq/bin/rabbitmq-server-wait (code=killed, signal=TERM)
  Process: 1099 ExecStart=/usr/sbin/rabbitmq-server (code=killed, signal=TERM)
 Main PID: 1099 (code=killed, signal=TERM)

implying it's crashed or failed to start. However, when I run htop, I see dozens of erlang and beam.smp processes, which are launched by Rabbit.

Furthermore, when I go to restart Rabbit with sudo service rabbitmq-server restart it hangs for about five minutes and then finally returns with:

Job for rabbitmq-server.service failed because a timeout was exceeded. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.

When I run journalctl -xe I see a ton of messages like:

Mar 21 20:07:48 server1 postfix/error[3719]: 280524B3A: to=<root@mydomain.com>, orig_to=<root>, relay=none, delay=101268, delays=101268/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspende
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2D046FAC: from=<>, size=3126, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2D8AD474F: from=<root@mydomain.com>, size=751, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/error[3712]: 2ED9D499A: to=<root@mydomain.com>, orig_to=<root>, relay=none, delay=155868, delays=155868/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspende
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2EBCF3D40: from=<>, size=3128, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/error[3706]: 2D8AD474F: to=<root@mydomain.com>, orig_to=<root>, relay=none, delay=38268, delays=38268/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended:
Mar 21 20:07:48 server1 postfix/error[3716]: 2D046FAC: to=<root@mydomain.com>, relay=none, delay=76240, delays=76240/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to porta
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2C9DE3945: from=<>, size=3134, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/qmgr[1784]: 2AA2A48B3: from=<root@mydomain.com>, size=751, nrcpt=1 (queue active)
Mar 21 20:07:48 server1 postfix/error[3717]: 2C9DE3945: to=<root@mydomain.com>, relay=none, delay=399644, delays=399644/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to po
Mar 21 20:07:48 server1 postfix/error[3701]: 2EBCF3D40: to=<root@mydomain.com>, relay=none, delay=181242, delays=181242/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to po
Mar 21 20:07:48 server1 postfix/error[3712]: 2AA2A48B3: to=<root@mydomain.com>, orig_to=<root>, relay=none, delay=59268, delays=59268/0/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended:

Am I correct in concluding Rabbit is trying to send a ton of email, is being blocked, and is subsequently crashing? Why is this?

Martin Schröder
  • 315
  • 1
  • 5
  • 24
Cerin
  • 3,497
  • 17
  • 57
  • 72

2 Answers2

2

I fixed it with:

sudo killall rabbitmq-server
sudo killall beam.smp
sudo rm -Rf /var/lib/rabbitmq/mnesia/*
sudo service rabbitmq-server start

I also had to re-add my user configurations, but otherwise, that brought it back up.

Cerin
  • 3,497
  • 17
  • 57
  • 72
1

That does not appear to be a "crash"... so much as a graceful shutdown due to a problem. Apparently, the service timed out. I am assuming this is because it could not connect to the remote messaging server. The "emails" you have posted indicate that it tried to send an email notification of the failure... which probably also means that the postfix mail server isn't configured to relay messages outside the box.

TheCompWiz
  • 7,349
  • 16
  • 23