I have a 3 node kubernetes cluster with k8s 1.22 version with flannel setup and running fine. I did run live migration steps from flannel to calico as describes in here. Cluster migrated to calico successfully. However when I tried multiple times, couple of times I encountered below scenario.
- flannel migration job is running even after 20 hours.
- flannel to calico migration was not successfull.
- two nodes out of 3 nodes(node2 and node3) were running calico node
- one node(node1) was in schedulingdisabled mode and calico node was not running on the setup.
- When I read logs from flannel migration job, it was trying to reach kube-apiserver and it was down during that time, hence it was failed.
Later after sometime when kube-api server was stable I tried re-running flannel migration job, it didn't proceed and nothing happened.
How do I make flannel to calico migration idempotent. I want to rerun and make sure flannel to calico migration completes in case of failures.
Any help and suggestions are appreciated.