We are using ceph version 14.2.0. We have 4 hosts with 24 BlueStore OSDs, each is 1.8TB (2TB spinning disk). We have only a single pool with size 2 and I am absolutely sure that we are using more space than what ceph df shows:

[root@blackmirror ~]# ceph osd dump | grep 'replicated size'
pool 2 'one' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 900 pgp_num 900 autoscale_mode warn last_change 37311 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

[root@blackmirror ~]# ceph df
CLASS     SIZE       AVAIL      USED       RAW USED     %RAW USED
    hdd       44 TiB     21 TiB     22 TiB       23 TiB         51.61
    TOTAL     44 TiB     21 TiB     22 TiB       23 TiB         51.61

    POOL     ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
    one       2     2.7 TiB       2.94M     5.5 TiB     28.81       6.7 TiB

Not sure about MAX AVAIL, but I think it's wrong too.

Here's the output of ceph osd df:

 0   hdd 1.81310  1.00000 1.8 TiB  1.2 TiB  1.2 TiB 152 KiB 2.8 GiB 669 GiB 64.10 1.24  94     up
 1   hdd 1.81310  1.00000 1.8 TiB  937 GiB  935 GiB  80 KiB 2.2 GiB 926 GiB 50.31 0.97  72     up
 2   hdd 1.81310  1.00000 1.8 TiB  788 GiB  786 GiB  36 KiB 1.9 GiB 1.0 TiB 42.33 0.82  65     up
 3   hdd 1.81310  1.00000 1.8 TiB  868 GiB  866 GiB 128 KiB 2.1 GiB 995 GiB 46.59 0.90  69     up
 4   hdd 1.81310  1.00000 1.8 TiB  958 GiB  956 GiB  84 KiB 2.3 GiB 904 GiB 51.45 1.00  72     up
 5   hdd 1.81879  1.00000 1.8 TiB 1015 GiB 1013 GiB  64 KiB 2.4 GiB 847 GiB 54.50 1.06  77     up
 6   hdd 1.81310  1.00000 1.8 TiB 1015 GiB 1012 GiB  32 KiB 2.6 GiB 848 GiB 54.48 1.06  81     up
 7   hdd 1.81310  1.00000 1.8 TiB  935 GiB  932 GiB  40 KiB 2.3 GiB 928 GiB 50.18 0.97  70     up
 8   hdd 1.81310  1.00000 1.8 TiB  1.0 TiB  1.0 TiB  48 KiB 2.5 GiB 800 GiB 57.05 1.11  83     up
 9   hdd 1.81310  1.00000 1.8 TiB 1002 GiB 1000 GiB  96 KiB 2.3 GiB 861 GiB 53.79 1.04  77     up
10   hdd 1.81310  1.00000 1.8 TiB  779 GiB  777 GiB 168 KiB 1.9 GiB 1.1 TiB 41.80 0.81  63     up
11   hdd 1.81310  1.00000 1.8 TiB  1.1 TiB  1.1 TiB 128 KiB 2.6 GiB 768 GiB 58.77 1.14  83     up
12   hdd 1.81310  1.00000 1.8 TiB  798 GiB  796 GiB 120 KiB 1.9 GiB 1.0 TiB 42.85 0.83  67     up
13   hdd 1.81310  1.00000 1.8 TiB  1.1 TiB  1.1 TiB  64 KiB 2.6 GiB 761 GiB 59.12 1.15  89     up
14   hdd 1.81310  1.00000 1.8 TiB  1.2 TiB  1.2 TiB 128 KiB 2.7 GiB 680 GiB 63.51 1.23  88     up
15   hdd 1.81310  1.00000 1.8 TiB  766 GiB  764 GiB  64 KiB 1.9 GiB 1.1 TiB 41.15 0.80  58     up
16   hdd 1.81310  1.00000 1.8 TiB  990 GiB  988 GiB  80 KiB 2.4 GiB 873 GiB 53.15 1.03  81     up
17   hdd 1.81310  1.00000 1.8 TiB  980 GiB  977 GiB  80 KiB 2.3 GiB 883 GiB 52.61 1.02  77     up
18   hdd 1.81310  1.00000 1.8 TiB  891 GiB  890 GiB  68 KiB 1.7 GiB 971 GiB 47.87 0.93  73     up
19   hdd 1.81310  1.00000 1.8 TiB  1.1 TiB  1.1 TiB  60 KiB 2.0 GiB 784 GiB 57.87 1.12  87     up
20   hdd 1.81310  1.00000 1.8 TiB  956 GiB  955 GiB  48 KiB 1.8 GiB 906 GiB 51.37 1.00  73     up
21   hdd 1.81310  1.00000 1.8 TiB  762 GiB  760 GiB  32 KiB 1.6 GiB 1.1 TiB 40.91 0.79  58     up
22   hdd 1.81310  1.00000 1.8 TiB  979 GiB  977 GiB  80 KiB 1.9 GiB 883 GiB 52.60 1.02  72     up
23   hdd 1.81310  1.00000 1.8 TiB  935 GiB  934 GiB 164 KiB 1.8 GiB 927 GiB 50.24 0.97  71     up
                    TOTAL  44 TiB   23 TiB   22 TiB 2.0 MiB  53 GiB  21 TiB 51.61
MIN/MAX VAR: 0.79/1.24  STDDEV: 6.54

And here is the output of rados df

[root@blackmirror ~]# rados df
one       5.5 TiB 2943372      0 5886744                  0       0        0 11291297816 114 TiB 24110141554 778 TiB        0 B         0 B

total_objects    2943372
total_used       23 TiB
total_avail      21 TiB
total_space      44 TiB

In reality we are storing around 11TB of data, so total_used above looks right because our replication size is 2.

This started happening after we changed OSDs 18-23. They were initially 1TB disks, but we upgraded them to 2TB to balance the cluster. After we changed the first disk, USED and MAX AVAIL from ceph df dropped to around 1TB. I thought this is just a matter of time, but even after all recovery operations has finished, we are left with the picture above. I have tried to force a deep scrub on all disks, which nearly killed all applications in the cluster for 12 hours, but it did nothing at the end. I am clueless as to what to do now. Please help.

  • 131
  • 8

0 Answers0