1

I have a 5-node Proxmox cluster using Ceph as the primary VM storage backend. The Ceph pool is currently configured with a size of 5 (1 data replica per OSD per node) and a min_size of 1. Due to the high size setting, much of the available space in the pool is being used to store unnecessary replicas (Proxmox 5-node cluster can sustain no more than 2 simultaneous node failures), so my goal is to reduce the size parameter to 3 by altering the setting on the pool itself, therefore increasing available pool space.

I've gone through the Proxmox and Ceph documentation but couldn't find information on reducing the size parameter on a live pool. I did find the command to set the size parameter, but not sure of any potential issues I may encounter, or whether or not reducing the size on a live pool is even possible. Unfortunately I can't run any tests either, since the pool is running in production.

I've already considered simply creating a new pool with the appropriate parameters, but I would prefer to save time migrating the data from one pool to another if I can.

Thanks in advance.

EDIT:

root@node1:~# ceph osd pool ls detail
pool 4 'ceph-5' replicated size 5 min_size 1 crush_rule 0 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode warn last_change 42673 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
Matthew U.
  • 13
  • 3

1 Answers1

2

In an erasure-coded pool you can't reduce the pool's size, this is only possible for replicated pools. Since you mention a min_size 1 you probably have a replicated pool. Can you confirm by adding ceph osd pool ls detail to your question? Mask any sensitive data, if necessary.

Changing the pool's size shouldn't be too invasive but of course highly depends on the actual load and performance since it will cause some data movement. By the way, don't use min_size 1 but rather 2, even if size is 3. If some disks/servers fail and you only have 1 replica left but the clients are still able to write into the cluster you risk data loss. With min_size of 2 the I/O stops as soon as there's only 1 replica left until you recover. But the risk of data loss decreases.

eblock
  • 215
  • 1
  • 4
  • Thanks very much! I added the command output to my question to confirm `replicated` pool. Considering the risk of data loss with the `min_size` of 1, we will likely proceed with a `size` of 4 and `min_size` of 2. In our case we are quick to respond to disk/node failures, so likely would not experience any such data loss from a `min_size` of 1; but of course, it's better to be safe than sorry :) – Matthew U. Jan 29 '21 at 18:24
  • The default size for a replicated pool is usually 3 with a min_size of 2 which works for most use cases, but it should match your actual requirements and more important: your failure-domains. If you can share your `ceph osd tree` I could help figure it out. You seem to use the default replicated crush rule which can lead to following scenario: you have 5 nodes and all of the replicas are on the same node. If that node fails your clients can't access the data until its recovered. This can be avoided with the correct crush rules. Add `ceph osd crush rule dump replicated_ruleset` to the question. – eblock Jan 30 '21 at 09:47