5

I have tried every conceivable combination in an attempt to cluster rabbitmq in an AWS environment. But to recap:

  1. Shutdown and removed erlang and rabbit distribution on my local ubuntu 14
  2. Tried the auto configuration modules around the web
  3. The ubuntu 14 version the comes default installed won't cut it.
  4. The erlang cookies match - this is demonstrated below

The hostname mismatch is the only puzzle. The node itself thinks it's hostname is 'q1' or 'q2' respectively. When I try to set the host name of the container to the private dns name of the host (so it can connect to the other node) the rabbit instance in the container crashes. Not below how hostname produces q2 but I shelled into an amazon private dns?

root@q2:~# hostname
q2
root@q2:~# exit
christian@q2:~$ logout
Connection to ip-10-0-3-101.us-west-2.compute.internal closed.

I am using the latest rabbitmq docker image.

docker run -d --restart always --hostname q1 --name rabbitmq -p 4369:4369 -p 15671:15671 -p 25672:25672 -p 15672:15672 -p 5672:5672 -e RABBITMQ_HIPE_COMPILE=1 -e RABBITMQ_ERLANG_COOKIE='ilikecookies' rabbitmq:3-management

The service starts up just fine

root@q1:~# curl -I localhost:15672
HTTP/1.1 200 OK
Content-Length: 1419
Content-Type: text/html
Date: Fri, 20 Jan 2017 22:46:12 GMT
last-modified: Fri, 20 Jan 2017 22:38:45 GMT
Server: MochiWeb/1.0 (Any of you quaids got a smint?)

And here is the cookie from host q1

root@q1:~# docker exec -it rabbitmq /bin/bash
root@q1:/# cat /var/lib/rabbitmq/.erlang.cookie                                                                               
ilikecookies
root@q1:/# 

Now I attempt to cluster it (from host q2 with q1 being the master)

root@q2:~# docker exec -it rabbitmq /bin/bash
root@q2:/# rabbitmqctl stop_app
Stopping node rabbit@q2 ...
root@q2:/# rabbitmqctl join_cluster rabbit@ip-10-0-3-56.us-west-2.compute.internal
Clustering node rabbit@q2 with 'rabbit@ip-10-0-3-56.us-west-2.compute.internal' ...
Error: unable to connect to nodes ['rabbit@ip-10-0-3-56.us-west-2.compute.internal']: nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@ip-10-0-3-56.us-west-2.compute.internal']

rabbit@ip-10-0-3-56.us-west-2.compute.internal:
  * connected to epmd (port 4369) on ip-10-0-3-56.us-west-2.compute.internal
  * epmd reports node 'rabbit' running on port 25672
  * TCP connection succeeded but Erlang distribution failed
  * suggestion: hostname mismatch?
  * suggestion: is the cookie set correctly?
  * suggestion: is the Erlang distribution using TLS?

current node details:
- node name: 'rabbitmq-cli-41@q2'
- home dir: /var/lib/rabbitmq
- cookie hash: quN0y0GUm2Zxv8VYc2eX9A==

root@q2:/# cat /var/lib/rabbitmq/.erlang.cookie
ilikecookies
root@q2:/# 

The question is HOW do you get these things to cluster? What's the missing ingredient? The error message means nothing around the web. Does anyone have some experience in this?

UPDATE AWS security group for these instances:

Custom TCP Rule
TCP
1024 - 65535
0.0.0.0/0
  • 1
    "TCP connection succeeded but Erlang distribution failed". There are a few suggestions there. Have you considered those? – Tim Jan 21 '17 at 02:19
  • This project automates the clustering process in AWS ECS. It should make the process less painful and more resilient to failures. https://github.com/malawson/rabbitmq-ecs-autoclustering – arnaud lawson Jul 10 '17 at 02:07

1 Answers1

2

OK, I got it!

the host names of the each node have to line up inside the container.

On the host machine (q2) I checked what hosts it knew of in the hosts file:

# This file was generated by OpsWorks
# any manual changes will be removed on the next update.

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

# OpsWorks Layer State
127.0.0.1 localhost.localdomain localhost
127.0.1.1 q2.localdomain q2

10.0.3.56 q1.localdomain q1
10.0.3.101 q2.localdomain q2


root@q2:/# ping q1
PING q1.local (10.0.3.56): 56 data bytes
^C--- q1.local ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

Then,it occurred to me that it doesn't matter at all what the host machine knows, it's what the docker container knows about. So, I shelled into the container and did the same thing:

root@q2:/# cat /etc/hosts
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2  q2.local

Now we are on to something! So, I added an entry for the node master:

root@q2:/# echo "10.0.3.56    q1.local q1" >> /etc/hosts
root@q2:/# which ping
/bin/ping
root@q2:/# ping q1
PING q1.local (10.0.3.56): 56 data bytes

And, took another swing inside the container

root@q2:/# rabbitmqctl stop_app
Stopping node rabbit@q2 ...
root@q2:/# rabbitmqctl join_cluster rabbit@q1                                     
Clustering node rabbit@q2 with rabbit@q1 ...
root@q2:/#

And now, each node recognizes that it's clustered! Woot!

I think for clustering with docker I am going to modify the docker command to mount the hosts /etc/hosts file into the docker image with -v /etc/hosts:/etc/hosts:ro and then this should just work magically

Another step I forgot to mention: The local Ubuntu box had an ancient version of erlang running that I had to remove (and it had rabbit as well).