1
A long running job (45h) is moved to another pod causing it to restart.
From the logs I can see that the job received a SIGTERM then it was restarted on another pod and probably on another node too.
The informations retrieved in google cloud are not helping. The pages Yaml
or events
do not describe this event except for the pod creation.
The job Yaml creationTimestamp: 2019-06-15T10:39:25Z
The pod Yaml creationTimestamp: 2019-06-17T13:26:25Z
I use mostly a default configuration 1.12.6-gke.11
with several of nodes and the servers are not preemptible.
Is it a default behavior of k8s ? If it is, how can I disable it ?
Are you using cluster autoscaling? Does the pod request adequate resources - i.e. was it evicted (showing status "Evicted"), or simply moved because of an issue with its node? Do you have node automatic upgrades enabled? Do you have a PodDisruptionBudget for the pod? – John – 2019-06-19T01:51:29.557
We are using
autoscaling
. There was no statusevicted
, if it was a node problem we didn't see it in GCC. We haveautomatic upgrade
enabled and the pod has noPodDisruptionBudget
. This is a recurring problem. – should_be_working – 2019-06-20T14:11:36.047