1

We have set up a Google Kubernetes Cluster with VMs scaling from 2 to 5. There are 5-6 pods/containers running on it. We have also set up ngnix for routing, and everything is running perfectly.

But we are having issues with the Google cluster. It automatically gets rebuilt and all the VMs are also getting recreated, which causes issues with pods running on it. We have set Release channel to none in cluster software update settings. We are assuming it is happening due to software upgrade of GKE cluster.

Please advise how can we check it.

Andrew Schulman
  • 8,561
  • 21
  • 31
  • 47
Mahendra
  • 11
  • 1

1 Answers1

0

We are assuming it is happening due to software upgrade of GKE cluster. Please advice how can we check it.

Your assumption is probably right. It can be related to auto-upgrade feature as if it was only due to autoscaler, your VMs would get recreated only when your cluster is scaled in and scaled out again. So most likely your VMs get recreated due to auto-upgrade feature enabled.

To check the state of auto-upgrade for an existing node pool, run:

gcloud container node-pools describe node-pool-name \
  --cluster cluster-name \
  --zone compute-zone

where:

  • node-pool-name is the name of the node pool.
  • cluster-name is the name of the cluster that contains the node pool.
  • compute-zone is the zone for the cluster.

and search for autoUpgrade which you can do by adding to the above command | grep autoUpgrade.

You can verify when it exactly occurs in logs as explained in this answer or you can check node pool upgrade status as described here. To prevent auto upgrades from happening unexpectedly and causing mentioned issues to your workload availability, you can consider configuring maintenance windows and exclusions. When you plan your maintenance window, keep in mind that there are other situations when GKE nodes need to be recreated.

You may also consider changing surge upgrade parameters:

Surge Upgrades allow you to change the number of nodes GKE upgrades at one time and the amount of disruption an upgrade makes on your workloads.

The max-surge-upgrade and max-unavailable-upgrade flags are defined for each node pool. For more information on chosing the right parameters, go to Determining your optimal surge configuration.

mario
  • 525
  • 3
  • 8
  • Hello @Mahendra and welcome to ServerFault! Please remember to [react to answers for your questions](https://stackoverflow.com/help/someone-answers). That way we know if the answers were helpful and other community members could also benefit from them. Try to [accept answer](https://stackoverflow.com/help/accepted-answer) that is the final solution for your issue, upvote answers that are helpful and comment on those which could be improved or require additional attention. Enjoy your stay! – Wytrzymały Wiktor May 13 '21 at 13:15