0

I am using the following kubeadmin config with external etcd setup for HA kubernetes setup following https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd-nodes in bare metal server with centos7.

etcd version - v3.2.26

kind: ClusterConfiguration
kubernetesVersion: v1.13.1
apiServer:
  certSANs:
  - "k8-master01.loc.prov.domain.tld"
controlPlaneEndpoint: "k8-master01.loc.prov.domain.tld:8080"
etcd:
    external:
        endpoints:
        - https://k8-master01.loc.prov.domain.tld:2379
        - https://k8-master02.loc.prov.domain.tld:2379
        - https://k8-master03.loc.prov.domain.tld:2379
        caFile: /etc/kubernetes/pki/etcd/ca.crt
        certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
        keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key

However init keeps failing at following step:


I0204 15:04:24.985393  142883 uploadconfig.go:133] [upload-config] Preserving the CRISocket information for the control-plane node
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8-master01.loc.prov.domain.tld" as an annotation
I0204 15:04:25.485719  142883 round_trippers.go:419] curl -k -v -XGET  -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" -H "Accept: application/json, */*" 'https://k8-master01.loc.prov.domain.tld:8080/api/v1/nodes/k8-master01.loc.prov.domain.tld'
I0204 15:04:25.488810  142883 round_trippers.go:438] GET https://k8-master01.loc.prov.domain.tld:8080/api/v1/nodes/k8-master01.loc.prov.domain.tld 404 Not Found in 3 milliseconds

It keeps retrying and then eventually times out.


error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition

( Hostname has been fuzzed in above log )

Any suggestions as to how we can proceed?

Anshu Prateek
  • 201
  • 3
  • 9
  • Can you check is this `k8-master01.loc.prov.domain.tld` resolving on your nodes? – Nick Rak Feb 06 '19 at 10:40
  • Yup, this resolution works fine. We resolved the issue. The SSL termination for haproxy was messed up and it was causing failures in certain cases only. – Anshu Prateek Feb 07 '19 at 12:45
  • 1
    I've posted a community wiki answer. It would be great if you could add more valuable information about solution details to that answer. – VAS Feb 15 '19 at 17:36

1 Answers1

1

As Anshu Prateek mentioned in the comments:

We resolved the issue. The SSL termination for haproxy was messed up and it was causing failures in certain cases only.

VAS
  • 370
  • 1
  • 9