5

Im currently evaluating storage systems for xenserver. Because data replication is important in the case of a failure i have a question regarding replication in ceph.

As far as i know every disk in a node is an osd by itself (disks are not in any raid configuration). Is the ceph replication algorithm aware of the fact that 2 osd's are on the same node so not replicating the data on these osd's?

Minimal Example: 2 nodes with 2 disks each. Because of the non raid setup each disk is a osd -> 4 osd's. Node A: OSD1, OSD2; Node B: OSD3, OSD4. I set the replication amount to 2 and save an object into ceph. Will the object be saved and replicated on so that in case of a node failure the data is completely accessible?

Thank you for your answers

laubed
  • 66
  • 1
  • 4
  • In addition to the answers I'd recommend to not run pools with size 2 in production. It's only a matter of time until you'll run into issues, the mailing list is full of such failures. Either run replicated pools with size 3 (or more, of course) or erasure-coded pools according to your resiliency requirements. – eblock Mar 22 '22 at 11:13

2 Answers2

5
  1. Yes
  2. You can define the policy to replicate by node, racks, datacenters, etc.
splattne
  • 28,348
  • 19
  • 97
  • 147
petertc
  • 2,190
  • 1
  • 13
  • 10
3

By default, the CRUSH replication rule(replicated_ruleset) state that the replication is at the host level. You can check this be exporting the crush map:

ceph osd getcrushmap -o /tmp/compiled_crushmap
crushtool -d /tmp/compiled_crushmap -o /tmp/decompiled_crushmap

The map will displayed these info:

rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host

The types of replication is listed in the at the beginning of the map:

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

In order to get to HEALTH_OK state and have your object replicated based on your rules, you have to change the type of replication to osd in your specific case. The map can be recompiled by running:

crushtool -c /tmp/decompiled_crushmap -o /tmp/compiled_crushmap
ceph osd setcrushmap -i /tmp/compiled_crushmap

You can find more info about how to play with the CRUSH map in the ceph documentation: http://docs.ceph.com/docs/master/rados/operations/crush-map/

The placement of a specific object can be found using:

ceph osd map {pool-name} {object-name}

If you want to check a map of all the object you can do that by looking at the placement group dump(consider your own info to be displayed):

ceph pg dump | awk  '{print $1 "\t" $2 "\t" $15 "\t" $16}'

Regarding the OSD, you can consider an OSD any type of logical of physical storage unit(folder/partition/Logical Volume/disk/LUN)

baucelmic
  • 31
  • 1