0

I have been working with Docker for quite some time now and already have a production environment that started small but now has grown to more than 50 Linux nodes, each of those nodes running one Docker container.

I have been orchestrating everything using custom python scripts and so far it has worked great, but since we will probably continue the expansion, I'm thinking that at some point I will be needing a reliable, flexible orchestration tool, so might as well start planning eveything now.

I started reading about docker orchestration options, and so far I've narrowed it down to two options: Kubernetes or docker-swarm.

Looking at Kubernetes, it seems like the right choice (as a robust, flexible orchestration system), but I'm not sure if it's suitable for my environment. From what I've been reading, it can't handle already existing docker containers, it can only be used to create a new cluster from the ground up. In our case, we're using dedicated servers with special hardware requirements, so there's no way that we could have another parallel, alternate cluster where we could start fresh with Kubernetes.

On the other hand, docker-swarm looks to be more suitable for orchestrating existing docker containers, but seems to have reliability and scaling issues.

Can someone with extensive Kubernetes/docker-swarm experience give me a bit of advice on how I should approach the migration to an orchestrated docker cluster, without the need to create one from scratch (since it's impossible in my case)?

Thank you!

Tony
  • 269
  • 4
  • 15
  • 1
    Why is there only one container per server? What are the special hardware requirements? – Michael Hampton May 27 '19 at 16:59
  • Only one container can run on a server, because it's using the dedicated GPU with Nvidia CUDA and a second container will fail to start. – Tony May 28 '19 at 05:06
  • 1
    Hmm. Whatever direction you go, you could certainly migrate in stages. There's probably no need to do it all in one go. – Michael Hampton May 28 '19 at 05:58
  • Unfortunately, the infrastructure is at more than 90% usage and, even worse, for now there's no possibility to rent identical hardware (we basically rented everything that the datacenter had)... – Tony May 28 '19 at 07:21
  • Wow. Um, I suggest putting more than one GPU in a server, space permitting -- and of course to run server form factors that allow for this. You may need more power after that though. Anyway you can start, e.g. Kubernetes with three nodes and just expand it as you go. To start, you could make use of unused capacity on your existing servers (and you probably have a lot of it). – Michael Hampton May 28 '19 at 09:30

1 Answers1

1

I do not know much about Docker Swarm but I can share my thoughts about Kubernetes side.

You will anyway need some additional machines if you want to make changes in the environment - your machines are on 90% usage so I don't see how could you migrate or start using Kubernetes there in current scenario. Other than switching off your nodes one by one and gradually migrating to Kubernetes. You can try with changing two of the current nodes to 1 kubernetes master and 1 node (using for example kubespray) and then join your containers to the cluster as Deployments.

You already mentioned that you can't start orchestrating running containers as you will need to spin it all up from scratch to start using Kubernetes - but one thing is important here. If you do it right with K8s you will never have this problem again. The declarative way of Kubernetes and the fact that everything is an yaml file makes the future easier. You will just need to keep your services, deployments, configmaps etc. backed up and this problem will disappear in the future. Also orchestration, self healing, upgrades and scaling will no longer be this complicated.

So to make the long story short. I do not claim that what I will write is the only correct way I will just share what I would do in your situation.

You already mentioned you are renting the hardware - why not move to a Cloud then? You do not need to move everything right away. What I mean is - recreate your infra in Cloud - maybe in one of managed Kubernetes Services (GKE, EKS, AKS or many others) as they are the easiest to manage and you will be backed by engineers from Google/AWS/Azure etc. to help you if you will have issues with your cluster. After you are done with the environment in the Cloud and it works as expected then you can decide what is next.

Is it worth to stay with all of your infrastructure or maybe it is better to move back to on-premise? In this case you will already have experience with Kubernetes, ready YAML files, backups etc. and moving this from Cloud Kubernetes to on-premise Kubernetes will be much simpler - all you will need to do is almost 1:1 migration + some tinkering with networking.

There are free trials available so you can test if this works for you this way - in GCP you get 300$ free trial and also they provide continuous usage discounts. So you could test your application on smaller scale. In AWS there is a free tier which should be enough to test some basic features. Not sure about other cloud providers but they surely have corresponding offers.

So either you try to migrate partially to the Cloud and play with dividing the traffic between on-prem and Cloud or copy the whole infra to Cloud and then reproduce it on-prem if you will like it.

Other way is to turn off two nodes create a Kubernetes master and Node on them and then slowly join your nodes to the cluster one by one. That also depends what kind of application are you running, can you afford downtime etc. but it can be all solved with for example canary deployment.

aurelius
  • 174
  • 4
  • Thank you for your answer. The infrastructure can't currently be replicated in a cloud, as a service, because of multiple requirements that are not met (already did a lot of digging into this), so for now we're stuck with what we have... – Tony Jun 04 '19 at 06:14
  • 1
    then if it is possible you could go with one node at a time. – aurelius Jun 06 '19 at 15:59