Persistent storage in EKS cluster with multiple availability zones

Question

I have an EKS cluster with one linux worker node, which may instantiate in any availability zone within a region. I need to use persistent storage volume so my data won't be lost in case the node dies. It is worth mentioning that I'm talking about RabbitMQ data.

I've tried using an EBS volume, but it has a hard limitation in which it is bound to a single Availability Zone. In case the node dies, and then instantiates to a different AZ, it fails to mount the EBS volume.

So far I have the following ideas:

Have a single EBS volume attached to a worker node. When the worker node restarts in a different Availability Zone, create an EBS snapshot, and use it to create a new EBS volume in the correct Availability Zone. The new node instance will mount the new EBS volume.
Have a worker node for each Availability Zone, with a dedicated EBS volume. RabbitMQ can automatically duplicate the data across the EBS volumes. This eliminates the need for using EBS snapshots, as suggested in solution 1.
Have a single EFS volume which can be attached to multiple nodes across all Availability Zones.

In addition, I came across this post which explains more sophisticated approaches for my issue:

The other option I would recommend for Kubernetes 1.10/1.11 is to control where your volumes are created and where your pods are scheduled:

To create volumes in pre-determined zones, you can create custom StorageClass objects for each zone you want to use (see https://kubernetes.io/docs/concepts/storage/storage-classes/#aws-ebs).

To specify the zones where your pods with PVs are scheduled, you can use affinity or nodeSelector: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

If you are using cluster auto-scaling, keep in mind that you probably need separate auto-scaling-groups for each AZ (see kubernetes/autoscaler#501) You can also read a bit about this here: kubernetes/kubernetes#34583

Can you help me in comparing these approaches? For example, in terms of scalability, cost-efficiency, maintainability... Or perhaps you can think of a better one?

@Mr.Stiven, did you get this figured out? What solution did you go with? — Gowie47, Nov 23 '20 at 15:38
@Gowie47, In the end we decided to stick with Amazon SQS instead of RabbitMQ. Amazon SQS, being an external dependency (outside of the cluster), provides persistent storage out of the box. But of course, working with Amazon SQS has other disadvantages / challenges. — Mr.Stiven, Dec 21 '20 at 13:05

score 1 · Answer 1 · answered Dec 27 '20 at 17:09

The solution to this problem is using EFS instead of EBS, this will ensure that when a node dies, new pods will be able to connect to the same storage.

EFS is replicated across multiple availability zones, and it cost 3x more then EBS.

you may want to consider more cost effective solution with less admin overhead by using a hosted message queue service like Kafka or Kinesis .. etc

Persistent storage in EKS cluster with multiple availability zones

1 Answers1