0

I would like to calculate the Hardware sizing for a ceph cluster. There are so few references to the sizing, that I try to get these details here in the community.

E.g. what shall i have depending on

  • spindle drives (7.2k, 10k, 15k)
  • SATA & SAS 6G SSDs
  • SAS 12G SSDs
  • NVMe PCIe v3
  • NVMe PCIe v4

Now the questions are

  • how many CPUs I shall have?
  • how many cores shall be available?
  • how many OSDs per drive type shall be planned?
  • how many RAM per OSD shall be planned?

Target: achieve best performance out of the node with the given drives. Means IOPS and bandwidth

A combined question to the drives are the limiting controllers.

How many drives per controllers shall be connected to get the best performance per node? Is there a hardware controller recommendation for ceph?

is there maybe an calculator for calculating the sizing?

cilap
  • 277
  • 5
  • 14
  • Does this answer your question? [Can you help me with my capacity planning?](https://serverfault.com/questions/384686/can-you-help-me-with-my-capacity-planning) – djdomi Dec 27 '21 at 17:40

1 Answers1

0

I can't find a link now as a source. But this is what I used in my cluster (10 OSD servers, 500 TB)

  • CPU: 1 core per OSD (hard drive). Frequency, higher as possible.
  • RAM: 1 Gb per 1TB of the OSD storage.
  • 1 OSD per hard drive.
  • Monitors doesn't need too much memory and CPU.
  • It is better to run monitors separately from OSD server, in case if server contains a lot of OSDs , but not mandatory.
  • If you plan to run a lot of OSDs (more than 2) per server, it is better not to use those servers to host virtual machines. OSDs requires quite a lot of memory and CPU power.
MaksaSila
  • 76
  • 3
  • thank you for the answer. the scaling is somehow different whenever you read a new article. E.g. the OSDs per Disk, the GB per storage and so on Do you have a hardware sizing of your OSD servers along with what disks you used and what bandwidth and IOPS you achieved? – cilap Mar 04 '20 at 07:15
  • It was in previous work place, so I can't provide some data (for example exact IOPS). I had 2 cluster. 1 small: 3 servers with 3 10TB disks for OSD + 1TB Samsung NVMe for journal, 2x 10GbE interfaces. It was also ProxMox cluster, so not all the resources were dedicated for the CEPH. Second cluster was: 3 dedicated monitors, 10 OSD servers. The OSDs were: SSD disks, 2TB 2.5 inch and 10TB 3.5 inch hard drives + Intel NVMe's for journals, total 500 TB. – MaksaSila Mar 04 '20 at 11:40
  • So the OSD servers were with the sizing according the my answer. And it was enough, less would be not good. I used the CEPH benchmark and it was 10Gbit/s throughput on both clusters. The latency was low (less than 100 milliseconds) and stable. So server sizing was according the recommendations. If server has 12 10TB hard drives, so it runs 12 OSDs, the CPU (or CPUs were at least 12 cores, preferably with HT off) and at least 128GB RAM. Smaller hosts had smaller configuration. – MaksaSila Mar 04 '20 at 11:40