4

I have an EKS cluster with one linux worker node, which may instantiate in any availability zone within a region. I need to use persistent storage volume so my data won't be lost in case the node dies. It is worth mentioning that I'm talking about RabbitMQ data.

I've tried using an EBS volume, but it has a hard limitation in which it is bound to a single Availability Zone. In case the node dies, and then instantiates to a different AZ, it fails to mount the EBS volume.

So far I have the following ideas:

  1. Have a single EBS volume attached to a worker node. When the worker node restarts in a different Availability Zone, create an EBS snapshot, and use it to create a new EBS volume in the correct Availability Zone. The new node instance will mount the new EBS volume.

  2. Have a worker node for each Availability Zone, with a dedicated EBS volume. RabbitMQ can automatically duplicate the data across the EBS volumes. This eliminates the need for using EBS snapshots, as suggested in solution 1.

  3. Have a single EFS volume which can be attached to multiple nodes across all Availability Zones.

In addition, I came across this post which explains more sophisticated approaches for my issue:

The other option I would recommend for Kubernetes 1.10/1.11 is to control where your volumes are created and where your pods are scheduled:

Can you help me in comparing these approaches? For example, in terms of scalability, cost-efficiency, maintainability... Or perhaps you can think of a better one?

Mr.Stiven
  • 41
  • 1
  • 3
  • you could the helm chart for rabbit stable/rabbitmq-ha – c4f4t0r Jul 30 '20 at 10:26
  • @Mr.Stiven, did you get this figured out? What solution did you go with? – Gowie47 Nov 23 '20 at 15:38
  • @Gowie47, In the end we decided to stick with Amazon SQS instead of RabbitMQ. Amazon SQS, being an external dependency (outside of the cluster), provides persistent storage out of the box. But of course, working with Amazon SQS has other disadvantages / challenges. – Mr.Stiven Dec 21 '20 at 13:05

1 Answers1

1

The solution to this problem is using EFS instead of EBS, this will ensure that when a node dies, new pods will be able to connect to the same storage.

EFS is replicated across multiple availability zones, and it cost 3x more then EBS.

you may want to consider more cost effective solution with less admin overhead by using a hosted message queue service like Kafka or Kinesis .. etc

user1007727
  • 421
  • 5
  • 20