GKE cluster looses data

Question

I am new to GCP so pardon the ignorance.

I have a 3 node GKE cluster that is running a database application. Each node has a 100GB standard persistent disk allocated. However, I find every so often (has happened at least 3 time since august) that I boot up and the data is lost and any firewall rules that had been put in place are reset to default.

Hoe can I:

Stop the data in the DB from being erased
prevent the firewall rules from being reset

Is this due to infrastructure upgrading?

I think we have some misconceptions here. First of all the data used by your `DB` should be stored on `PVC`'s backed by [GCE-PD](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes). Second of all you shouldn't change the firewall rules. Is there any specific reason behind it? Could you please tell more about the setup you're having? How the `YAML` definition of your `Deployment` looks like? Could you add the options you've chosen when creating the cluster that are responsible for updates (surge, max unavailable, etc.) — Dawid Kruk, Dec 11 '20 at 13:58
External Ingress FW settings were changed from 0.0.0.0/0 to whitelist known IP only. — Simon Robinson, Dec 13 '20 at 03:58
Sorry I am not sure I am familiar with how to get the information you are after. Most settings were set as default as infrastructure and network management is not something i have experience with. External Ingress FW settings were changed from 0.0.0.0/0 to whitelist known IP only. — Simon Robinson, Dec 13 '20 at 04:06
Could you please tell how exactly did you install your software (your `db`) on your `GKE` cluster? — Dawid Kruk, Dec 16 '20 at 13:03

score 0 · Accepted Answer · answered Dec 18 '20 at 07:40

Posting this answer as the community wiki as the underlying topic of the question could be a bit wide.

Feel free to expand it.

Why a `GKE` cluster can loose data?

Without specific information on how exactly the application/workload was deployed on a GKE cluster it could be hard to pinpoint the actual issue.

It's worth to mention following things:

Workloads that have an expectation to store data (like databases) should be using Persistent Volumes. In case of a node failure the data stored on a PV will not be lost as it will be stored on different entity.

PersistentVolume resources are used to manage durable storage in a cluster. In GKE, a PersistentVolume is typically backed by a persistent disk.

Cloud.google.com: Kubernetes Engine: Docs: Concepts: Persistent Volumes

There is a guide for deploying WordPress on GKE with Persistent Disks and Cloud SQL. It could be used an example for deploying workload with PVC (Persistent Disk):

Cloud.google.com: Kubernetes Engine: Docs: Tutorials: Persistent disk

Data stored on Pods that do not have any Volumes configured will be lost in case of any pod recreation.
Data stored on a GKE node boot disk is not persistent across updates.

Modifications on the boot disk of a node VM do not persist across node re-creations. To preserve modifications across node re- creation, use a DaemonSet.

Cloud.google.com: Kubernetes Engine: Docs: How to: Node auto upgrade: Overview

Reffering to the question asked

I am new to GCP so pardon the ignorance.

I encourage you to visit the official documentation of GCP and GKE. You can find there are a lot information/guides and examples to follow:

Each node has a 100GB standard persistent disk allocated.

This disks are specifically used as boot disks for a GKE node and they shouldn't be used as a place to store data. You can use Persistent Volumes as mentioned earlier or opt for a local SSD on which you can read more by following below link:

Cloud.google.com: Kubernetes Engine: Docs: How to: Persistent Volumes: Local SSD

However, I find every so often (has happened at least 3 time since august) that I boot up and the data is lost

GKE cluster and nodes cannot be turned off. What you can do is reduce (scale) the amount of nodes in a node pool. Have you meant that you connect to it?

any firewall rules that had been put in place are reset to default.

You shouldn't reconfigure firewall rules of a GKE node. Instead you should be using the GCP Firewall located in Cloud Console (Web UI) -> VPC Network -> Firewall. A node recreation due to a node upgrade or failure will reset the firewall rules.

Hoe can I:

Stop the data in the DB from being erased

prevent the firewall rules from being reset

Is this due to infrastructure upgrading?

You could consider (depending on your exact use case) using the GCE instance instead of a GKE cluster. GKE is a managed Kubernetes cluster designed to run containerized workloads and some of the parts of it are managed by Google (like for example control plane).

As for infrastructure upgrading you could take a look on what happens when a cluster is upgraded by following below links:

Additional reference:

Kubernetes.io
Cloud.google.com: SQL <- alternative solution to run DB in GCP

You are right I am using the standard persistent disks for boot and shouldn't be storing data their. However, I could not find a guide that was helpful to explain provisioning a persistent volume to my initial setup.... I will have some additional reading to do — Simon Robinson, Dec 20 '20 at 08:45

GKE cluster looses data

1 Answers1

Why a GKE cluster can loose data?

Reffering to the question asked

Why a `GKE` cluster can loose data?