Pre-populating GCE persistent disks while building and/or before running Docker image on GKE

Question

I would like to build a Docker container and then run it in GKE after mounting some directories from GCE persistent disks (PDs). For instance, I'd like for the application's (read-write) configuration files in /etc/<application>/ to live longer than its pods (which may restart at any time.)

The regular build puts default configuration files into /etc/<application>/ and it is imperative that these are somehow "copied" once from the image's ephemeral disk into the PD, such that the application can start in its expected environment.

Is there a best practice for making this happen? For instance, would I have to mount PDs also in my Dockerfile, or can I somehow request that PDs be "synced" with files from another directory/volume/disk when they are first mounted by a VM instance during deployment?

Did you try deploying your cluster using the --local-ssd-count flag [1]? This command will create your cluster with local ssd disks depending on the count specified. Keep in mind that the default local SSD size is 375 GB. [1]: https://cloud.google.com/container-engine/docs/clusters/operations#create_a_cluster_with_local_ssd_nodes — George, Jul 26 '16 at 19:31
@George What would be the advantage of doing that with respect to the question at hand? — Drux, Jul 26 '16 at 19:40

score 2 · Answer 1 · answered Jul 30 '16 at 15:48

The obvious answer is to populate each persistent disk immediately after creating it.

If the application configs change from build to build, and they must match the running build, then there's an unresolved problem about what to do if multiple app versions share the same PD and conflict over what should be stored there.

If you don't need to worry about cross-version PD sharing, then you can initialize the contents of the PD using a job running in the application's pod. Kubernetes has a feature called init containers designed to make this easier; but it's still alpha at the time of writing.

+1 for init containers. They seem to execute as often as containers start, though, which is too often for initializing the PD (once). — Drux, Jul 30 '16 at 18:36

Drux · Accepted Answer · 2016-07-30T18:39:08.537

I did not hear about best practices, so this is what I have adopted for now:

docker build the image with Dockerfile that also tars e.g. /etc/<application>/ into <application>.tar after it has done its other build steps
briefly docker run the image and scp tar files off the running image
briefly create a temporary VM instance and add/attach the PD to it; scp tar files to the VM instance; gcloud compute ssh into tit, mount the PD, and untar needed files below mount point

Pre-populating GCE persistent disks while building and/or before running Docker image on GKE

2 Answers2

Linked