3

We are designing a new cluster architecture for our web service and are planing to use Ceph object storage and kubernetes for our services. for optimizing our servers We have different options:

  1. Use identical servers and run Ceph and our services on all of them and manage with kubernetes

  2. Like above use identical servers but label some of them for Ceph and don't run services on them

  3. Use two types of servers: one optimized for io and one optimized for cpu. Then run Ceph on io ones and services on cpu ones. and manage all of them with kubernetes

  4. Like above having two separated servers, but don't use kubernetes for io ones and let ceph handle everything (isn't it simpler to not use kubernetes for our Ceph cluster?)

I know identical servers have benefits of better scaling. On the other hand having two types of servers let us to optimize each of them. What is the best solution?

Thomas
  • 4,155
  • 5
  • 21
  • 28

2 Answers2

2

Some things to consider:

If you are using rotating disks, then you may want to have separate disks for Ceph and for random Kubernetes tasks. That way random I/O from kubernetes tasks don't break the sequentiality of Ceph accesses (esp. for writes and large reads). Obviously, you can accomplish that with (2), (3) or (4). But you could also accomplish that with your option (1) if you have multiple disks in your servers (JBOD), and allocate each disk to either Ceph or Kubernetes but not both (or if you use a separate boot flash drive for Kubernetes, etc..)

If your cpu-optimized servers happen to come with a large boot disk, you may end up feeling like that storage is stranded because the serving jobs don't use it all, and later wish you could run Ceph on those nodes too, to unstrand that storage. But if it is a small disk/ssd, then you may not care.

There will be some uncertainty in how many servers you need. (e.g. growth, failures, imprecise load estimates). You have to over-buy because of this uncertainty. The overbuying is worse with 2 SKUs instead of 1 SKU. And it is harder to repurpose servers later as your needs change. This sort of favors (1) or (2).

From a security standpoint, you might be more comfortable if serving jobs are not on the same machine as your storage. This is more important if you have a variety of different serving jobs that are trusted to different degrees.

I'm not sure what kind of "optimizing" you want to do to your server SKUs. Choosing SKUs that exactly fit one pod is not a good practice. You should have smaller Pods and trust the scheduler to bin pack.

Eric Tune
  • 155
  • 5
1

Should you run Ceph in Kubernetes?

If you are looking to use Ceph to provide PVs for your containers you should run it outside of Kubernetes.

If you are looking to run Ceph using DaemonSet and StatefulSet, you should consider this. There are some suggestions as to deciding whether that is a good fit for your organization.

What types of SKUs should you buy?

If your priority is to optimize your Ceph deployment for maximal throughput you will want one or more SSDs for the Ceph journal and multiple SSDs/HDDs for the block storage. You will not want to share these devices with other workloads. If you use Kubernetes to manage Ceph in this configuration, and you statically partition all your other workloads to other servers, there will be little benefit to using Kubernetes.

If you are optimizing for maximum cost/density, the right choice is dependent on the mix of workloads. If Ceph is the only storage workload, you may still be able to save money by running it on storage density optimized SKUs in a separate footprint.