Kubernetes MySQL cluster in a private cloud

Question

I'm interested in MySQL cluster formed from 1 primary and 2 secondaries.

Usually in a public cloud we

use external storage
use services such as RDS so replication and failover handled behind this service
you can recreate failed pod on a different node, because storage and DB are not running on any of your k8s nodes

Solution that worked in a private cloud, but not in Kubernetes:

by using local storage
by using mysqlfailover utility so it can nominate a new primary
by changing DNS record of "mysql-0" (primary) and instructing application to refresh DNS so it can see a new primary in failover event

Exploring Kubernetes solution:

which one to use local storage or NFS? (if NFS, how would you do a cluster between different servers?)
by using https://github.com/oracle/mysql-operator , Percona, similar solutions or even the same mysqlfailover - which one would you prefer and how it deals with failover case? Preferable is open source option.

If I try to merge my current working mysqlfailover solution and move to Kubernetes, I may need to set up Node Affinity so pods get their local storages attached correctly.

Also this mysqlfailover mechanism should be improved (starting point is here https://medium.com/@zzdjk6/step-by-step-setup-gtid-based-mysql-replica-and-automatic-failover-with-mysqlfailover-using-docker-489489d2922 ) because it can for example nominate a new primary mysql-1, while original one (mysql-0) is down. Based on my understanding this could be not the best option, because in usual architecture, we always want to have mysql-0 as primary in a StatefulSet while mysqlfailover works completely opposite.

So which option would you choose if not solving existing issue? Which steps would you take? What MySQL and Kubernetes components would you use?

Many thanks

Interested in any solution that fits into Kubernetes mechanism and is tolerant in master's failover event with less efforts possible. I have started to explore Galera and Percona. If you have any high view for Galera, that would be great to know. I will share my final solution or details when it's done in a comment or as separate answer too. — laimison, Jan 06 '20 at 23:28
To simplify this, in failover event, k8s would start a new pod on another server with the hostname mysql-0 (this hostname always means - master). So the question how it can sort out the storage. Maybe it could use the storage from slave. Alternative solution would be master-master-master approach, then in k8s format, it's completely easy, in short, same approach as outside of k8s. But do we have a robust and free master-master option nowadays? What are the downsides of this solution? — laimison, Jan 07 '20 at 13:32
M-M is "free"; M-M-M, also free, is a nightmare to recover from if there is a failure. M->S1->S2 is sometimes used for an upgrade path: Upgrade S1, add S2, jetison M, promote S1 to new Master. — Rick James, Jan 07 '20 at 16:42
Thanks. Probably M-M-M approach should be denied to avoid bigger issues that it could create. There is no special need of M-M-M initially. As I noticed at some places there is operator used in Kubernetes as here https://www.percona.com/doc/kubernetes-operator-for-pxc/kubernetes.html , but this is pretty new thing to speak about it at this stage. — laimison, Jan 07 '20 at 18:04
PXC (based on Galera) can have 3 Masters (or Master+Slaves), but with the kinks worked out; it is a good product. I don't know anything about k8s. PXC configured with 3 writable nodes gracefully recovers from failure of one node with essentially zero downtime and essentially zero manual intervention (other than to deal with the dead hardware). What would k8s bring to the table? — Rick James, Jan 07 '20 at 18:50
1) less servers needed 2) everything is served/git-tracked and visible on a single platform 3) efficient computing resources usage 4) customisations written based/around k8s standards 5) Docker separates dynamic data from dependencies in a clear standard 6) as a DevOps guy, I learn Kubernetes 7) my web services are pretty small so I have more freedom to improve this mechanism by the time 8) clustering was hard long ago, I think it's getting more standardised and achievable than ever, some people don't like putting DB to a container, maybe after some years it will be just normal procedure — laimison, Jan 07 '20 at 21:02
I didn't mention Kubernetes features such as self-healing, prepared backup containers by community, networking/DNS management for HA (e.g. no DNS server needed is), deployment strategies and other built-in features, so you can benefit from these — laimison, Jan 07 '20 at 21:10
Re "1": Any fault-tolerant solution needs 3 separate servers, preferably in 3 datacenters. Granted, they could be small dockers in each server, but I don't see how fewer servers would be needed. — Rick James, Jan 07 '20 at 21:12
That is an idea: 3 servers in different 3 data centres. I think this is a minimum (odd number) for any high quality cluster. In previous comment, I meant about multiple different services on each server managed in efficient way - that is just one of the mentioned advantages. — laimison, Jan 07 '20 at 23:10
By the way, I guess I got your thoughts. Less servers because I don't need to create server for DB, server for backend, etc., also support virtualization to lose some of RAM. If setting up linux ulimits, different users for containers and kubernetes resources correctly, it shouldn't break the server. I started to play with kubernetes operator for PXC. — laimison, Jan 08 '20 at 01:17
Odd number of nodes is not required; >= 3 nodes is required. For 4 nodes, they must not be weighted equally. For 4 nodes in 3 colos, use weights of 1,1,2,2 would allow graceful recovery if any one node or colo crashed, where the 1 and 1 are in one colo. — Rick James, Jan 08 '20 at 01:58
Many thanks for your input. I didn't know about those weights in MySQL world. I have done MongoDB cluster in the past so they solved this problem by having votes I believe. Now it's time to merge Kubernetes and containerisation knowledge with missing MySQL bits, even if I have working MySQL cluster using mysqlfailover inside Docker Compose containers, but as I mentioned mysqlfailover works in opposite philosophy comparing with Kubernetes and I think mysqlfailover is an old tool. I have a feeling that MySQL operators should be used in k8s (PXC operator or mysql-operator). — laimison, Jan 08 '20 at 11:35
I am familiar with PXC/Galera failover, but not k8s or `mysqlfailover`, so I cannot discuss their differences. — Rick James, Jan 08 '20 at 16:42
Seems ProxySQL is used as a "load balancer" for PXC cluster solution in Kubernetes. This container is automatically and instantly rescheduled on another server if particular one goes down. Architecture is here - https://www.percona.com/doc/kubernetes-operator-for-pxc/architecture.html . It's multi-master. @RickJames maybe you know whether PXC can designed to run each instance on local storage instead of external (non k8s related question)? — laimison, Jan 09 '20 at 01:10
All forms of MySQL/MariaDB/clustering/etc can use either local disks or SAN; I don't understand that part of your question. — Rick James, Jan 09 '20 at 01:14
Hmmm... Too many overloaded words: "pod", "operator", "object". What is k8s bringing to the table? — Rick James, Jan 09 '20 at 01:21
Cheers. You have answered that. I don't really want to depend on SAN, NFS, S3 (AWS) or other storage that can be mounted by multiple instances. Based on my understanding there are no exceptions in this solution: local storage should be fine. If I wanted to explain "pod" in a shortest form, this is just a "container" or sometimes group of containers. So you can configure just a "pod". I still don't know what "operator" does, it's some custom "object" which calls Kubernetes API so it knows all events and likely does decisions on what to do in each scenario. So it's lower level object. — laimison, Jan 09 '20 at 01:33
Luckily, they provide a PDF which includes Github with Kubernetes configuration files, except storage (so I'm trying to sort out this bit) - https://learn.percona.com/hubfs/Manuals/Percona%20Kubernetes%20Operators/Percona%20Kubernetes%20Operator%20for%20Percona%20xtraDB%20Cluster/percona-kubernetes-operator-for-pxc-1.3.pdf — laimison, Jan 09 '20 at 01:34
1x ProxySQL and 1x Operator pods are always running. If any crash, they are instantly started on another server. So application can call only ProxySQL hostname, that's it. Operator should be tolerant for a quick disruption (while it's started on another server). As containers are lightweight, it should literally take milliseconds. Now the cryptic part is Operator, it can do anything as this is custom component (it can have any code which interacts with Kubernetes API - like app). — laimison, Jan 09 '20 at 01:44
hi @RickJames thanks for your comments about databases in the beginning of this year. That a bit helped to strengthen my position when choosing solution. I got back to this and answered my own question. I have installed it to Kubernetes and tested failover scenarios so going to continiously use it. If any curiosity and questions to merge Kubernetes knowledge with non-Kubernetes knowledge, please let me know. — laimison, Aug 28 '20 at 14:20
Thanks. I see that there are 1.2K Questions on kubernetes (just click on the tag button). May I suggest you impart some of your insight into a few of them. — Rick James, Aug 28 '20 at 17:00
Yep, that's my plan. When I do something I always look for possibilities to help someone on SO, especially if I didn't find straight answer. This is in Kubernetes and other DevOps technoligies that I have been working full-time. — laimison, Sep 10 '20 at 22:31

laimison · Accepted Answer · 2021-04-23T22:03:35.833

The solution I ended up is Percona XtraDB Cluster on Kubernetes. It has a Kubernetes operator to automatically manage failover scenarios.

Your app shouldn't know anything about clustering, because it's sorted out transparently under kubernetes-service-hostname:3306. So app calls this address and behind it there are 3x SQLProxy/HAProxy containers (per server). Then query is routed to one of three MySQL containers.

When server goes down, failed SQLProxy/HAProxy and MySQL containers are removed from Kubernetes so kubernetes-service-hostname contains two instead of three members.

When server is back online, containers are created to have full cluster again.

There is also Percona operator container which automatically helps to manage pods and do other actions so cluster is fully operating.

In terms of storage, it can be just hostPath local directory which shows a sign of simplicity from storage perspective. You could also use PersistentVolumeClaim and any type of Storage Class behind it or external storage such as NFS.

It's actually multi-master setup.

More details:

https://www.percona.com/doc/kubernetes-operator-for-pxc/kubernetes.html

Kubernetes MySQL cluster in a private cloud

1 Answers1