Kubernetes rolling update that usually retains local data?

Question

A Kubernetes application which uses local node storage to hold mutable state (as in the Kubernetes 101 example) loses its storage when the app is updated. This is a side-effect of the typical Deployment update approach of turning up new pods and turning down the old pods. This is unfortunate, as it means recopying data (possible hundreds of gigabytes) onto each node even though the data are often already there in an unreachable volume. This greatly slows down updates.

What can an application programmer do to optimize this? Some pod attributes can be updated in-place, but this only covers a small subset of updates. Persistent volumes are intrinsically remote, not local, so they can't be mmapped and won't have the same performance as local storage; and they inappropriately have lifetime independent of the deployment that should own them. Issue #9043 discusses the issue, but it doesn't seem to be reaching any consensus; and, anyway, sometimes the pod can be replaced on the same node but not updated in-place. Issue #7562 started to discuss it, but it turned into a discussion of persistent volumes. Issue #598 is related, but it's really for times when you'd rather the pod remain unassigned to any node instead of starting it with an empty directory.

Treat local storage as ephemeral - because at the moment it is! — Michael Hampton, Jul 31 '16 at 17:04

score 3 · Answer 1 · edited Jun 11 '20 at 10:02

As of the current Kubernetes design local storage should always be treated as ephemeral, just like a container or pod. Not just because of scenarios like this, but because your pod could crash and be rescheduled at any time. From the volume documentation:

When a Pod is removed from a node for any reason, the data in the emptyDir is deleted forever.

...

Some uses for an emptyDir are:

scratch space, such as for a disk-based merge sort

checkpointing a long computation for recovery from crashes

holding files that a content-manager container fetches while a webserver container serves the data

GCE SSD persistent disks are pretty fast, but if you really need the performance of local storage than temporarily copy data from persistent storage to work with it.

iwaseatenbyagrue · Answer 2 · 2019-09-26T07:39:33.323

You could consider using a hostPath or local PV, if you can tolerate the constraints it imposes.

This should allow you to get the performance advantages of mmapping (since the volume is local). However, you need to consider a few things:

hostPath PVs support only ReadWriteOnce (https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes), meaning you may need to use an update policy of Delete rather than RollingUpgrade.
if you have a cluster with more than one node, you need to consider how to deal with your PV's contents being out of date - something an initContainer may be able to help work around.
depending on your needs, you also need to consider the impact of using local storage in general (hostPath PV or otherwise) on your ability to scale - being able to extract maximum performance from one instance is one thing, but you may find scaling out horizontally ends up being a more flexible, generic solution than heavy optimisation.

and they inappropriately have lifetime independent of the deployment that should own them.

One other thing you may want to look into is your choice of object - to my mind at least, a Deployment should be stateless.

Statefulsets are typically a better fit for cases where you do really need some state hanging around (and statefulsets have a different interpretation of rolling updates than deployments), and provide some help around managing PVs.

Kubernetes rolling update that usually retains local data?

2 Answers2