GlusterFS Volume Creation Suggestion

Question

I have to deploy multiple Openshift Clusters from 3 nodes to 10 nodes. For 3 nodes i am creating volumes as replicated.

But for 4 and above it doesn't look nice to create replicated volume, so each node has 300GB disk and replicating it to 10 nodes is not optimal. I am looking for formula to use like

For 4 nodes create volume as disperse:2:1
For 5 nodes create volume as disperse:?:?
For 6 nodes create volume as disperse:?:?
For 7 nodes create volume as disperse:?:?
For 8 nodes create volume as disperse:?:?
For 9 nodes create volume as disperse:?:?
For 10 nodes create volume as disperse:?:?

Environment: I will use these volumes for MYSQL 5.7.28 and Each Server has 300GB disk, Out of 300GB i will create volume with size of 250GB for MYSQL.

OpenShift 3.11 version

# gluster --version
glusterfs 6.1

PS: i have no background of storage, so excuse if i am missing some obvious point i tried to search on google but couldn't extract required info.

Aravinda Vishwanathapura · Accepted Answer · 2020-01-20T15:57:33.710

Are you planning to use all nodes as storage nodes or only a subset of the nodes as storage nodes? Based on your question, MySQL uses 250GiB, what other applications need storage?

Replicate Volume: Effective storage space available will be

volume_size = sum of storage available from three nodes / 3

In your case, Volume size will be 300GiB using three storage nodes.

Disperse Volume: Effective storage space available will be

volume_size = storage in single node * (number of bricks - redundancy count)

In your case, Volume size will be 300 * (3-1) = 600GiB. More detail is available here https://docs.gluster.org/en/v3/Administrator%20Guide/Setting%20Up%20Volumes/#creating-dispersed-volumes Disperse Volumes are good for archival purposes since it can save space compared to Replica volumes. But it may be slow compared to Replica because of the computation involved during every IO.

Kadalu(https://kadalu.io) project provides a different approach to provision Volumes in Kubernetes. It creates a single Gluster Volume from the storage and provides subvolumes out of that volume when PV is requested(In your case Storage for Mysql).

Kadalu currently supports Replica 1 and Replica 3 Volumes. Replica 1 is useful when the storage device is claimed from other storage providers, for example, AWS/Azure. Replica 3 provides a high availability of storage for applications even though one out of three nodes goes down. The recent blog post(https://kadalu.io/blog/kadalu-kubernetes-storage) explains multiple configurations available with Kadalu and using it with the existing storage.

Kadalu uses GlusterFS and integrated with Kubernetes natively, without using Gluster management daemon - glusterd.

Update: Added calculations for Disperse Volume

number of disperse bricks = data bricks + redundancy count

If 3 storage devices are available,

2 data bricks + 1 redundancy bricks

In case of 6 storage devices,

4 data bricks + 2 redundancy bricks

If number of redundancy bricks increases then the usable Volume size will get reduced. The volume will be available to applications even though number of bricks equivalent to redundancy bricks goes down. For example, in 4+2 configuration, Volume will be available even though 2 bricks out of 6 go down.

thanks for response and it clear bit of my concept, i will read more about Kadalu. But my question still unanswered what would be recommended value for disperse:?:? for 4 and above nodes as i want to use all nodes as storage? — ImranRazaKhan, Jan 20 '20 at 15:40
updated the answer, added the dispersed volume calculation. Let me know if that is useful. — Aravinda Vishwanathapura, Jan 20 '20 at 15:58

GlusterFS Volume Creation Suggestion

1 Answers1