Dual Primary OCFS2 DRBD encountered split-brain. Is recovery always going to be manual in this case?

Question

I've got two webservers which each have a disk attached. This disk is synced between them using drbd (2:8.3.13-1.1ubuntu1) in 'dual-primary' mode, and over the top of this I run ocfs2 (1.6.4-1ubuntu1) as a cluster filesystem. The nodes communicate on a private network 192.168.3.0/24. For the most part, this is stable, and works well.

Last night, there appeared to have been a network outage. This resulted in a split-brain scenario where node01 was left in Standalone and Primary, while node02 was left in WFConnection and primary. Recovery was a manual process this morning of diffing the two filesystems, deciding that node01 should be authoritative, putting node02 into secondary and then issuing drbdadm connect commands on each node. Remounting the filesystem after this and we're back up and running.

My question is: Is this type of outage always going to require a manual resolution? Or are there ways in which this process can be automated? My understanding was that drbd should try to be intelligent in the event of a split brain about working out which node should become primary and secondary. It seems that in this case, a simple network outage left both in primary, which my config just says 'disconnect'. Looking at the logs, what I find interesting is the fact that they both seemed to agree that node02 should be the SyncSource, and yet when looking at the rsync log, it's actually node01 that has the most recent changes. Also interesting is the line on node01 stating 'I shall become SyncTarget, but I am primary!'. To me, it looks like drbd tried to resolve this, but failed for some reason.

Is there a better way of doing this?

The config for r0 is this:

resource r0 {
    meta-disk internal;
    device /dev/drbd0;
    disk /dev/xvda2;

    syncer { rate 1000M; }
    net {
        #We're running ocfs2, so two primaries desirable.
        allow-two-primaries;

        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;

    }
    handlers{
        before-resync-target "/sbin/drbdsetup $DRBD_MINOR secondary";

        split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    }
    startup { become-primary-on both; }

    on node02 { address 192.168.3.8:7789; }
    on node01 { address 192.168.3.1:7789; }
}

I've also put the kern.log files on pastebin:

Node01: http://pastebin.com/gi1HPtut

Node02: http://pastebin.com/4XSCDQdC

Hum. Is there a specific reason you are not using quorum so the split-brain scenario could be effectively prevented instead of dealing with it after the fact? — the-wabbit, Mar 08 '13 at 22:17
Not sure I follow - the documentation I've been referencing regarding doing this sort of dual-primary setup makes no mention of using a quorum for drbd. I can see how a third-node quorum would help some split-brain scenarios, but initially can't see where/how this would be configured with drbd. — growse, Mar 08 '13 at 22:52

score 3 · Answer 1 · answered Jun 01 '13 at 20:09

IMHO you already choose the best SB-policy for DRBD. So in your case there had to be changes on the same part of the filesystem (i.e. DRBD-block) on BOTH sides.

So in that case - yes - you have to resolve that manually.

The question that arises to me is why did these concurrent accesses happen?

You should investigate into that direction. If network is down there should be no access at one side, so "discard zero changes" should help - but it did not.

Apart from that your should prevent split brains by having two or more INDEPENDENT network connections. I always use three of them on my clusters.

Dual Primary OCFS2 DRBD encountered split-brain. Is recovery always going to be manual in this case?

1 Answers1