0

I have a 3 node kubernetes cluster with k8s 1.22 version with flannel setup and running fine. I did run live migration steps from flannel to calico as describes in here. Cluster migrated to calico successfully. However when I tried multiple times, couple of times I encountered below scenario.

  • flannel migration job is running even after 20 hours.
  • flannel to calico migration was not successfull.
  • two nodes out of 3 nodes(node2 and node3) were running calico node
  • one node(node1) was in schedulingdisabled mode and calico node was not running on the setup.
  • When I read logs from flannel migration job, it was trying to reach kube-apiserver and it was down during that time, hence it was failed.

Later after sometime when kube-api server was stable I tried re-running flannel migration job, it didn't proceed and nothing happened.

How do I make flannel to calico migration idempotent. I want to rerun and make sure flannel to calico migration completes in case of failures.

Any help and suggestions are appreciated.

Siddharood
  • 61
  • 5

1 Answers1

0

My thought is that maybe the node got halfway migrated and is giving some wonky error as a result. When in doubt, revert and redo.

To determine which nodes were migrated to Calico:

kubectl get nodes -l projectcalico.org/node-network-during-migration=calico

Then cordon and drain the node that was migrated halfway:

kubectl drain {node name}

Log into the node and remove the CNI configuration:

rm /etc/cni/net.d/10-calico.conflist

Reboot the node and make sure to Enable flannel:

kubectl label node {node name} projectcalico.org/node-network-during-migration=flannel --overwrite

Uncordon the node:

kubectl uncordon {node name}

Remove the nodeSelector from the flannel daemonset:

kubectl patch ds/kube-flannel-ds-amd64 -n kube-system -p '{"spec": {"template": {"spec": {"nodeSelector": null}}}}'

Then lastly remove the migration label from the node in question:

kubectl label node {node name} projectcalico.org/node-network-during-migration-

NOW. Redo the migration with the knowledge that the kube-api server is up and stable and let me know what happens my guy.

Brendan
  • 1
  • 2