Does a CoreOS member need to have a public ip to join an etcd2 cluster?

Question

I am starting 5 coreos EC2 members all in a private subnet. Then assign one elastic ip to one of the members. It seems that only that one with an ip assigned can join the etcd2 cluster, and is perpetually waiting for the other 4.

here is my cloud-config

#cloud-config

coreos:
  update:
    reboot-strategy: "etcd-lock"
  etcd2:
    discovery: "https://discovery.etcd.io/_____hash_____"
    advertise-client-urls: "http://$private_ipv4:2379"
    initial-advertise-peer-urls: "http://$private_ipv4:2380"
    listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001"
    listen-peer-urls: "http://$private_ipv4:2380,http://$private_ipv4:7001"
  fleet:
    public-ip: "$private_ipv4"
    metadata: "region=us-west"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "fleet.service"
      command: "start"

here are errors from the member with a public ip

error #0: client: etcd member https://discovery.etcd.io returns server error [Gateway Timeout]
waiting for other nodes: error connecting to https://discovery.etcd.io, retrying in 4m16s
found self ae44c4332ec3c211 in the cluster
found 1 peer(s), waiting for 4 more

the other 4 members do not get as far

listening for peers on http://10.0.0.50:2380
listening for peers on http://10.0.0.50:7001
listening for client requests on http://0.0.0.0:2379
listening for client requests on http://0.0.0.0:4001
etcd2.service: Main process exited, code=exited, status=1/FAILURE
Failed to start etcd2.
etcd2.service: Unit entered failed state.
etcd2.service: Failed with result 'exit-code'.

Security group inbound rules

Custom TCP 7001  VPC subnet
SSH    TCP 22    0.0.0.0/0
Custom TCP 4001  VPC subnet
Custom TCP 2379  VPC subnet
Custom TCP 2380  VPC subnet

i've tested this in both CoreOS stable channel, and alpha channel

theRemix · Answer 1 · 2016-04-20T17:27:39.950

I spun up the cluster with the same settings, except that i enabled "Auto assign public ip" when creating the instances, and everything just worked™

i'm not sure yet why each member needs a public ip, since they are only advertising their $private_ipv4 within the network.

------ edit ------

I found that the issue that was "fixed" by auto assigning a public ip, was that it actually now had access to the internet (https 443)

Now that I know this, i just put all my cluster members in a private subnet connected to a NAT for 80,443 and it works now.

score 0 · Answer 2 · answered Mar 14 '16 at 09:18

I hit the same problem recently. It seems that the public address is not so much requirement of etcd2 like it is required by Internet Gateways in VPC. The documentation documentation says:

Ensure that instances in your subnet have public IP addresses or Elastic IP addresses.

Does a CoreOS member need to have a public ip to join an etcd2 cluster?

2 Answers2