17

I've set up a pair of identical servers with RAID arrays (8 cores, 16GB RAM, 12x2 TB RAID6), 3 10GigE interfaces, to host some highly available services.

The systems are currently running Debian 7.9 Wheezy oldstable (because corosync/pacemaker are not available on 8.x stable nor testing).

  • Local disk performance is about 900 MB/s write, 1600 MB/s read.
  • network throughput between the machines is over 700MB/s.
  • through iSCSI, each machine can write to the other's storage at more than 700 MB/s.

However, no matter the way I configure DRBD, the throughput is limited to 100MB/s. It really looks like some hardcoded limit. I can reliably lower performance by tweaking the settings, but it never goes over 1Gbit (122MB/s are reached for a couple of seconds at a time). I'm really pulling my hair on this one.

  • plain vanilla kernel 3.18.24 amd64
  • drbd 8.9.2~rc1-1~bpo70+1

The configuration is split in two files: global-common.conf:

global {
        usage-count no;
}

common {
        handlers {
        }

        startup {
        }

        disk {
                on-io-error             detach;
         #       no-disk-flushes ;
        }
        net {
                max-epoch-size          8192;
                max-buffers             8192;
                sndbuf-size             2097152;
        }
        syncer {
                rate                    4194304k;
                al-extents              6433;
        }
}

and cluster.res:

resource rd0 {
        protocol C;
        on cl1 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.1:7788;
                meta-disk internal;
        }

        on cl2 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.2:7788;
                meta-disk internal;
        }
}

Output from cat /proc/drbdon slave :

version: 8.4.5 (api:1/proto:86-101)
srcversion: EDE19BAA3D4D4A0BEFD8CDE 
 0: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:4462592 dw:4462592 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:16489499884
        [>....................] sync'ed:  0.1% (16103024/16107384)M
        finish: 49:20:03 speed: 92,828 (92,968) want: 102,400 K/sec

Output from vmstat 2 on master (both machines are almost completely idle):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 14952768 108712 446108    0    0   213   254   16    9  0  0 100  0
 0  0      0 14952484 108712 446136    0    0     0     4 10063 1361  0  0 99  0
 0  0      0 14952608 108712 446136    0    0     0     4 10057 1356  0  0 99  0
 0  0      0 14952608 108720 446128    0    0     0    10 10063 1352  0  1 99  0
 0  0      0 14951616 108720 446136    0    0     0     6 10175 1417  0  1 99  0
 0  0      0 14951748 108720 446136    0    0     0     4 10172 1426  0  1 99  0

Output from iperf between the two servers:

------------------------------------------------------------
Client connecting to cl2, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  3] local 192.168.42.1 port 47900 connected with 192.168.42.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  6.87 GBytes  5.90 Gbits/sec

Apparently initial synchronisation is supposed to be somewhat slow, but not this slow... Furthermore it doesn't really react to any attempt to throttle sync rate like drbdadm disk-options --resync-rate=800M all.

wazoox
  • 6,782
  • 4
  • 30
  • 62

3 Answers3

11

In newer versions of DRBD (8.3.9 and newer) there is a dynamic resync controller that needs tuning. In older versions of DRBD setting the syncer {rate;} was enough; now it's used more as a lightly suggested starting place for the dynamic resync speed.

The dynamic sync controller is tuned with the "c-settings" in the disk section of DRBD's configuration (see $ man drbd.conf for details on each of these settings).

With 10Gbe between these nodes, and assuming low latency since protocol C is used, the following config should get things moving quicker:

resource rd0 {
        protocol C;
        disk {
                c-fill-target 10M;
                c-max-rate   700M;
                c-plan-ahead    7;
                c-min-rate     4M;
        }
        on cl1 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.1:7788;
                meta-disk internal;
        }

        on cl2 {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.42.2:7788;
                meta-disk internal;
        }
}

If you're still not happy, try turning max-buffers up to 12k. If you're still not happy, you can try turning up c-fill-target in 2M increments.

Matt Kereczman
  • 1,887
  • 8
  • 12
  • Actually with this configuration performance drops to 3 MB/s. I'm trying to toy with these settings but prospects are grim. – wazoox Dec 04 '15 at 12:39
  • So far, disabling c-plan-ahead by setting it at zero and augmenting max-epoch-size and max-buffers seems to do the trick. – wazoox Dec 04 '15 at 14:09
  • 2
    What happens if you increase max-buffers to 20k, and c-fill-target to 20M? I believe slowly increasing these two values will eventually give you the results you're looking for. – Matt Kereczman Dec 04 '15 at 16:55
  • That's much better! It doesn't saturate the link (which is dedicated and though it's OK to fill up) but I'm already at 400MB/s. I'm playing a bit with these settings... – wazoox Dec 04 '15 at 17:25
  • 1
    Upping max-buffers from 250 to 2500 made a night-and-day difference for me (in my non-critical performance setup) – davidgo Oct 10 '17 at 06:35
  • max-buffers made a gigantic difference to my setup. Going from 600Mbps to 2.6Gbps (at which point the bottleneck was the underlying storage system). To get this I set max-buffers to 16000, although will probably be using 20000 (which seems to be the upper limit) going forward. – Ceisc Jun 02 '18 at 14:10
9

Someone elsewhere suggested that I use these settings:

        disk {
                on-io-error             detach;
                c-plan-ahead 0;
        }
        net {
                max-epoch-size          20000;
                max-buffers             131072;
        }

And the performance is excellent.

Edit: As per @Matt Kereczman and others suggestions, I've finally changed to this:

disk {
        on-io-error             detach;
        no-disk-flushes ;
        no-disk-barrier;
        c-plan-ahead 0;
        c-fill-target 24M;
        c-min-rate 80M;
        c-max-rate 720M;
} 
net {
        # max-epoch-size          20000;
        max-buffers             36k;
        sndbuf-size            1024k ;
        rcvbuf-size            2048k;
}

Resync speed is high:

cat /proc/drbd
version: 8.4.5 (api:1/proto:86-101)
srcversion: EDE19BAA3D4D4A0BEFD8CDE
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n-
    ns:133246146 nr:0 dw:2087494 dr:131187797 al:530 bm:0 lo:0 pe:5 ua:106 ap:0 ep:1 wo:d oos:4602377004
        [>....................] sync'ed:  2.8% (4494508/4622592)M
        finish: 1:52:27 speed: 682,064 (646,096) K/sec

Write speed is excellent during resync with these settings (80% of local write speed, full wire speed):

# dd if=/dev/zero of=./testdd bs=1M count=20k
20480+0 enregistrements lus
20480+0 enregistrements écrits
21474836480 octets (21 GB) copiés, 29,3731 s, 731 MB/s

Read speed is OK:

# dd if=testdd bs=1M count=20k of=/dev/null
20480+0 enregistrements lus
20480+0 enregistrements écrits
21474836480 octets (21 GB) copiés, 29,4538 s, 729 MB/s

Later edit:

After a full resync, the performance is very good ( wire speed writing, local speed reading). Resync is quick (5/6 hours) and doesn't hurt performance too much (wire speed reading, wire speed writing). I'll definitely stay with c-plan-ahead at zero. With non-zero values, resync is way too long.

wazoox
  • 6,782
  • 4
  • 30
  • 62
  • Increasing the max-buffers to 131K is not the most graceful approach to solving your issue. You're essentially giving DRBD 512MiB of system buffers to use for it's resync, which is a lot of buffer space. I've seen things happen with max-buffers larger than 80k. I would highly recommend tuning the resync controller settings, while increasing max-buffers at small increments until you're happy. – Matt Kereczman Dec 04 '15 at 16:52
  • @MattKereczman I'll change the settings, but I'd like to have an optimal (sync'ed) cluster as fast as possible before playing with production settings.... The defaults settings mean that sync takes at least several days and up to several weeks, this is simply not acceptable. The required production throughput is 500MB/s. – wazoox Dec 04 '15 at 17:17
5

c-plan-ahead have to set a positive value to enable dynamic sync rate controller. disk c-plan-ahead 15; // 5 * RTT / 0.1s unit,in my case is 15 c-fill-target 24; c-max-rate 720M;

Keven
  • 59
  • 1
  • 1