2

I am building a cluster of machines all running the same setup:

  • Ubuntu Server 20.04.2
  • during installation I select a unique short hostname
  • when OS is installed, I add microk8s 1.20/stable via snap and add permissions following this tutorial

I decided to turn off HA by running microk8s disable ha-cluster after installation.

I run microk8s add-node on master and first two machines connect successfully, creating a cluster with three nodes, one of them being master.

The problem occurs with the 4th machine. Although it connects just fine, kubelet doesn't use the "pretty" hostname as defined in /etc/hostname but my machine's internal IP. Everything works fine, but this results in an inconsistent and ugly node list.

Running microk8s.kubectl edit node on the master, I cherry pick the problematic machine on ip 192.168.0.134 (hostname zebra) and one of the machines which connected with its hostname as intended (rhombus):

- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      node.alpha.kubernetes.io/ttl: "0"
      volumes.kubernetes.io/controller-managed-attach-detach: "true"
    creationTimestamp: "2021-04-04T18:08:15Z"
    labels:
      beta.kubernetes.io/arch: amd64
      beta.kubernetes.io/os: linux
      kubernetes.io/arch: amd64
      kubernetes.io/hostname: 192.168.0.134
      kubernetes.io/os: linux
      microk8s.io/cluster: "true"
    name: 192.168.0.134
    resourceVersion: "27486"
    selfLink: /api/v1/nodes/192.168.0.134
    uid: 09c01d87-1ae4-452f-8908-6dcb85a5999a
  spec: {}
  status:
    addresses:
    - address: 192.168.0.134
      type: InternalIP
    - address: 192.168.0.134
      type: Hostname

  ...

- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      node.alpha.kubernetes.io/ttl: "0"
      volumes.kubernetes.io/controller-managed-attach-detach: "true"
    creationTimestamp: "2021-04-04T13:59:21Z"
    labels:
      beta.kubernetes.io/arch: amd64
      beta.kubernetes.io/os: linux
      kubernetes.io/arch: amd64
      kubernetes.io/hostname: rhombus
      kubernetes.io/os: linux
      microk8s.io/cluster: "true"
    name: rhombus
    resourceVersion: "27244"
    selfLink: /api/v1/nodes/rhombus
    uid: f125573a-0efb-444c-849b-f0521fe3b813
  spec: {}
  status:
    addresses:
    - address: 192.168.0.105
      type: InternalIP
    - address: rhombus
      type: Hostname

I find that the --hostname-override argument is causing this headache:

$ sudo grep -rlw "192.168.0.134" /var/snap/microk8s/2094/args
/var/snap/microk8s/2094/args/kube-proxy
/var/snap/microk8s/2094/args/kubelet
/var/snap/microk8s/2094/args/kubelet.backup
$ cat /var/snap/microk8s/2094/kubelet

...

--cluster-domain=cluster.local
--cluster-dns=10.152.183.10
--hostname-override 192.168.0.134

If I compare the file against the same one on machines without this problem, the last line is extra. Same goes for /var/snap/microk8s/current/..., I don't know what the difference between those is.

If I try to remove that line or change the IP to zebra, the settings is ignored and written over (somehow). To do this was suggested in an answer to a related question here. Other answers suggest reset, I use microk8s reset to no difference. To verify each step along the way, I run the same commands on one of the machines which connect with their "pretty" hostname. In the end, it always retained the "pretty" hostname.

What should I change before I connect the node in other to display the correct name? Why would the same installation steps on different machines result in a different node name?

EDIT: I reinstalled OS on the machine and the issue remains.


UPDATE:

I've forgotten to add the last node hostname to the master's /etc/hosts file:

$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 masterhost
192.168.0.x nodehost1
192.168.0.y nodehost2 
192.168.0.z nodehost3                  <--- missing 

Now everything uses the hostname rather than the IP. I still don't understand why the entry in the /etc/hosts changes the behaviour.

Andrew Schulman
  • 8,561
  • 21
  • 31
  • 47
porkbrain
  • 121
  • 4
  • If I understand correctly, the problem was solved by adding the `nodehost3` entry to the `/etc/hosts` file. Can you describe how you solved the problem and what was the cause in the separate answer ? It may be helpful for other community members. – matt_j Apr 06 '21 at 12:25
  • That's correct. @sysadmin1138 helpfully moved the answer to my question because I haven't been able to actually find out what is the process which decides on the name. It turns out that yes, having the entry in the `/etc/hosts` _before_ adding a node results in it being named after its hostname. However, I still don't understand why that's the case. – porkbrain Apr 06 '21 at 19:18

0 Answers0