I want to build Ceph Storage Cluster for HPC use. (CentOS 7 based) For now I have enterprise SAS RAID enclosure with 3 shelves by 12 4TB disks(36 total). Now it is configured as default RAID6 rig. And it's performance is very bad. Also I can't scale system. No way to switch to 6TB disks for example. So what I want to do.
- Switch from RAID6 to JBOD.
- Map each 12 disks to 3 different controller ports.
- Connect 3 servers to enclosure by SAS HBA card.
- Set one ceph pool. Type: CephFS. 512 pg_num. Erasure coding. Failure-domain=host. Bluestore.
- Mount CephFS pool on computing nodes with IPoIB.
Main questions are around 4th step.
- How to choose erasure coding k+m numbers? 3+3 4+2 8+3 8+4 10+4? Actually I can't fully understand how it will handle different failures. As I undestand my system need to handle 1 host down + 1-2 OSDs fails. Is it possible with 3 hosts config? If not, what will happen if OSD fail during heal process after host failure? What will happen if OSD fail when 1 host down for maintenance(heal not started)?
- Is it possible to add WAL/DB SSDs for Bluestore later as it is at filestore?
- Will HPC MPI calls suffer from IPoIB traffic on same IB interface and switch?
And overall question. Will it work at all, or I missed something global?