We have one deployment that consists of only one pod (with service and ingress). It is using a Docker container that executes a custom run script as its command.
When we roll out a new version, image is pulled, new pod is created and that script is started. At that point that new pod is "Running" and old pod is "Terminated" because number of desired pods is still 1.
However, this is the meat of our problem, is that this run script can sometimes take a few minutes to finish. It includes some DB migrations and other stuff that cannot be done during build (ie put in Dockerfile). This results that our new pod is running for a few minutes, but isn't ready to serve requests, resulting in some downtime of our service.
My question is - is there a way to "delay" the termination of an older pod to prevent this? Or delay the flagging of new pod as "Running"?
I know the ideal solution is to have more than 1 pods, but that is (currently) not possible as the service in question is not entirely stateless. But even if it were, if we had, for example, 3 pods, they would all enter "Running" state without actually finishing the tasks and yet again causing some (albeit smaller) downtime.
How should I deal with this kind of problem?