I have a minimal cloud-config that works without problems on DigitalOcean. I added some hardening for SSH, which requires restarting sshd.socket to become effective:

  - name: sshd.socket
    command: restart

Adding this unit alone (no actual sshd configuration changes) causes provisioning with the same cloud-config to fail when trying it on Hetzner:ssh: connect to host xx.xx.xx.xx port 22: Connection refused. It still connects fine on DigitalOcean though.

When I remove this unit then connecting to the Hetzner machine works fine, adding it again fails consistently.

Variable substitution

The only difference between both platforms that I know of is that on DigitalOcean the variables $public_ipv4 and $private_ipv4 are replaced with actual IP addresses, which is not the case on bare metal installs like Hetzner.

From the CoreOS Documentation:

Note: The $private_ipv4 and $public_ipv4 substitution variables referenced in other documents are only supported on Amazon EC2, Google Compute Engine, OpenStack, Rackspace, DigitalOcean, and Vagrant.

So I substitute the variables with the static IP address. I use the public IP address because that's the only interface available besides loopback.

However, when I provision without substituting these variables with the public IP address, then it ALSO connects fine.

Inspecting the journal reveals some errors related to name resolution:

systemd[1]: Starting etcd2...
etcd2[874]: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://:2379,http://:4001
etcd2[874]: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd2
etcd2[874]: recognized and used environment variable ETCD_DISCOVERY=https://discovery.etcd.io/616b3957c5c78e7738207011f9c51841
etcd2[874]: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=http://:2380
etcd2[874]: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=,
etcd2[874]: recognized and used environment variable ETCD_LISTEN_PEER_URLS=http://:2380
etcd2[874]: recognized and used environment variable ETCD_NAME=39b2a003672546f8a0b648dbc66e8f6f
etcd2[874]: etcd Version: 2.2.0
etcd2[874]: Git SHA: e4561dd
etcd2[874]: Go Version: go1.4.2
etcd2[874]: Go OS/Arch: linux/amd64
etcd2[874]: setting maximum number of CPUs to 1, total number of available CPUs is 12
etcd2[874]: listening for peers on http://:2380
etcd2[874]: listening for client requests on
etcd2[874]: listening for client requests on
etcd2[874]: resolving :2380 to :2380
etcd2[874]: resolving :2380 to :2380
etcd2[874]: error #0: dial tcp: lookup discovery.etcd.io: Temporary failure in name resolution
etcd2[874]: cluster status check: error connecting to https://discovery.etcd.io, retrying in 2s
etcd2[874]: error #0: dial tcp: lookup discovery.etcd.io: Temporary failure in name resolution
etcd2[874]: cluster status check: error connecting to https://discovery.etcd.io, retrying in 4s
etcd2[874]: found self 61dbc8c9c2aca1e8 in the cluster
etcd2[874]: found 1 needed peer(s)

But they don't seem fatal: systemctl status etcd2.service shows that the service is active:

core@localhost ~ $ systemctl status etcd2.service
● etcd2.service - etcd2
   Loaded: loaded (/usr/lib64/systemd/system/etcd2.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/etcd2.service.d
   Active: active (running) since Tue 2016-03-22 14:10:33 UTC; 7min ago
 Main PID: 874 (etcd2)
   Memory: 20.3M
      CPU: 1.771s
   CGroup: /system.slice/etcd2.service
           └─874 /usr/bin/etcd2

etcd2[874]: added local member 61dbc8c9c2aca1e8 [http://:2380] to cluster 216c373aaf11ccfa
systemd[1]: Started etcd2.
etcd2[874]: 61dbc8c9c2aca1e8 is starting a new election at term 1
etcd2[874]: 61dbc8c9c2aca1e8 became candidate at term 2
etcd2[874]: 61dbc8c9c2aca1e8 received vote from 61dbc8c9c2aca1e8 at term 2
etcd2[874]: 61dbc8c9c2aca1e8 became leader at term 2
etcd2[874]: raft.node: 61dbc8c9c2aca1e8 elected leader 61dbc8c9c2aca1e8 at term 2
etcd2[874]: published {Name:39b2a003672546f8a0b648dbc66e8f6f ClientURLs:[http://:2379 http://:4001]} to cluster 216c373aaf11ccfa
etcd2[874]: setting up the initial cluster version to 2.2
etcd2[874]: set the initial cluster version to 2.2

Containers that connect to other services like Logstash fail: the scheme http does not accept registry part: :9200 (or bad hostname?)


This is a stripped-down cloud-config, but it still demonstrates the issue (verified that).


  - "ssh-rsa A valid SSH key here"
    # NOTE: replace $discovery_url with a url generated at https://discovery.etcd.io/new?size=X
    discovery: $discovery_url
    advertise-client-urls: http://my.public.ip.address:2379,http://my.public.ip.address:4001
    initial-advertise-peer-urls: http://my.public.ip.address:2380
    listen-peer-urls: http://my.public.ip.address:2380          # Remove this flag or use localhost and the connection issue goes away
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: sshd.socket
      command: restart   # Remove this unit and all issues go away (but no SSH hardening in that case)

One thing I noticed is that when I remove the flag listen-peer-urls the connection issue also goes away, although logstash still doesn't start for the same reason.

This document says the default value for these flags are URLs with localhost, but the name of the substitution variables that are used on platforms like DigitalOcean seem to suggest that this should be an address that's reachable by peer machines.

When I use localhost for these flags I can connect, but the other issues are still there.

Question 1

What should be the proper cloud-config for bare metal machines that have a public and loopback interface only (no private network)?

Question 2

What is the relationship between sshd and etcd here that causes this failure?

Rolf W.
What should be the proper cloud-config for bare metal machines that have a public and loopback interface only (no private network)?

Insert the public IP for the machine in place of those variables.

What is the relationship between sshd and etcd here that causes this failure?

Can you share the sshd log? Why is it not starting?

  • The problem is I can't access the machine without SSH, so I don't know what is logged. It just occurred to me that Hetzner might offer console access though, which seems to be the case. I can book a time slot and technicians will connect [a remote KVM](http://wiki.hetzner.de/index.php/LARA/en#Remote_Console_.28LARA.29). I'll do that and report back what I find. – Rolf W. Mar 23 '16 at 20:03
  • After inspecting the journal I found that an error `sshd.socket: Failed to listen on sockets: Address already in use` was logged. This led me to experiment with various different combinations of settings for `sshd.socket`, based on [issue 426](https://github.com/coreos/bugs/issues/426). I tried setting `runtime: true`, `Conflicts=sshd.service`, `ReusePort=true`, `ExecStartPre=/usr/bin/sleep 20`. Adding just `ExecStartPre=/usr/bin/sleep 20` solved the issue. NOTE that it takes 20 seconds longer for SSH to become available, so connection will still be refused at first. – Rolf W. Mar 24 '16 at 13:35
  • This is clearly a workaround and I don't know why this is needed on Hetzner, but not on DigitalOcean. Also, my issue doesn't seem to be the same as that described in the bug report. Do you have any suggestions for more troubleshooting to find an actual solution? – Rolf W. Mar 24 '16 at 13:38
  • PS: I haven't checked the obvious, i.e. to see what is occupying port 22, but I assume that would be SSH. I'll verify that the next time I book a KVM though. Nonetheless, if you have any thoughts that could be useful for troubleshooting or to help me understand, I would appreciate that very much – Rolf W. Mar 24 '16 at 18:04