4

I'm testing setting up a Xen DomU with a DRBD storage for easy failover. Most of the time, immediatly after booting the DomU, I get an IO error:

[    3.153370] EXT3-fs (xvda2): using internal journal
[    3.277115] ip_tables: (C) 2000-2006 Netfilter Core Team
[    3.336014] nf_conntrack version 0.5.0 (3899 buckets, 15596 max)
[    3.515604] init: failsafe main process (397) killed by TERM signal
[    3.801589] blkfront: barrier: write xvda2 op failed
[    3.801597] blkfront: xvda2: barrier or flush: disabled
[    3.801611] end_request: I/O error, dev xvda2, sector 52171168
[    3.801630] end_request: I/O error, dev xvda2, sector 52171168
[    3.801642] Buffer I/O error on device xvda2, logical block 6521396
[    3.801652] lost page write due to I/O error on xvda2
[    3.801755] Aborting journal on device xvda2.
[    3.804415] EXT3-fs (xvda2): error: ext3_journal_start_sb: Detected aborted journal
[    3.804434] EXT3-fs (xvda2): error: remounting filesystem read-only
[    3.814754] journal commit I/O error
[    6.973831] init: udev-fallback-graphics main process (538) terminated with status 1
[    6.992267] init: plymouth-splash main process (546) terminated with status 1

The manpage of drbdsetup says that LVM (which I use) doesn't support barriers (better known as tagged command queuing or native command queing), so I configured the drbd device not to use barriers. This can be seen in /proc/drbd (by "wo:f, meaning flush, the next method drbd chooses after barrier):

 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:2160152 nr:520204 dw:2680344 dr:2678107 al:3549 bm:9183 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

And on the other host:

 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:2160152 dw:2160152 dr:0 al:0 bm:8052 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

I also enabled the option disable_sendpage, as per the drbd docs:

cat /sys/module/drbd/parameters/disable_sendpage
Y

I also tried adding barriers=0 to fstab as mount option. Still it sometimes says:

[   58.603896] blkfront: barrier: write xvda2 op failed
[   58.603903] blkfront: xvda2: barrier or flush: disabled

I don't even know if ext3 has a nobarrier option. And, because only one of my storage systems is battery backed, it would not be smart anyway.

Why does it still compain about barriers when I disabled that?

Both host are:

Debian: 6.0.4
uname -a: Linux 2.6.32-5-xen-amd64
drbd: 8.3.7
Xen: 4.0.1

Guest:

Ubuntu 12.04 LTS
uname -a: Linux 3.2.0-24-generic pvops

drbd resource:

resource drbdvm
{
  meta-disk internal;
  device /dev/drbd3;

  startup
  {
    # The timeout value when the last known state of the other side was available. 0 means infinite.
    wfc-timeout 0;

    # Timeout value when the last known state was disconnected. 0 means infinite.
    degr-wfc-timeout 180;
  }

  syncer
  {
    # This is recommended only for low-bandwidth lines, to only send those
    # blocks which really have changed.
    #csums-alg md5;

    # Set to about half your net speed
    rate 60M;

    # It seems that this option moved to the 'net' section in drbd 8.4. (later release than Debian has currently)
    verify-alg md5;
  }

  net
  {
    # The manpage says this is recommended only in pre-production (because of its performance), to determine
    # if your LAN card has a TCP checksum offloading bug.
    #data-integrity-alg md5;
  }

  disk
  {
    # Detach causes the device to work over-the-network-only after the
    # underlying disk fails. Detach is not default for historical reasons, but is
    # recommended by the docs.
    # However, the Debian defaults in drbd.conf suggest the machine will reboot in that event...
    on-io-error detach;

    # LVM doesn't support barriers, so disabling it. It will revert to flush. Check wo: in /proc/drbd. If you don't disable it, you get IO errors.
    no-disk-barrier;
  }

  on host1
  {
    # universe is a VG
    disk /dev/universe/drbdvm-disk;
    address 10.0.0.1:7792;
  }

  on host2
  {
    # universe is a VG
    disk /dev/universe/drbdvm-disk;
    address 10.0.0.2:7792;
  }
}

DomU cfg:

bootloader = '/usr/lib/xen-default/bin/pygrub'

vcpus       = '2'
memory      = '512'

#
#  Disk device(s).
#
root        = '/dev/xvda2 ro'
disk        = [
                  'phy:/dev/drbd3,xvda2,w',
                  'phy:/dev/universe/drbdvm-swap,xvda1,w',
              ]

#
#  Hostname
#
name        = 'drbdvm'

#
#  Networking
#
# fake IP for posting
vif         = [ 'ip=1.2.3.4,mac=00:16:3E:22:A8:A7' ]

#
#  Behaviour
#
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'

In my test setup: the primary host's storage is 9650SE SATA-II RAID PCIe with battery. The secondary is software RAID1.

Isn't DRBD+Xen widely used? With these problems, it's not going to work.

Edit: barriers are actually a known problem (here and here). I don't really see the solution yet.

Halfgaar
  • 7,921
  • 5
  • 42
  • 81

2 Answers2

2

I don't know if that changes anything, but you can also specify the DRBD volumes in your DomU configuration as follows:

disk = [ 'drbd:drbdvm,xvda2,w' ... ]

That way, upon creating the DomU, Xen will automatically make the current node primary for the specified resource (unless the resource is already used by the second machine). Also, when the DomU is destroyed, the resource will be released.

I have many DRBD pairs running like this and never seen the error you have posted.

Oliver
  • 5,883
  • 23
  • 32
  • That's only an aid, it doesn't actually change anything. Plus, it doesn't work with pygrub. It drbd docs say that with some version it should work again, but for me it doesn't... – Halfgaar May 30 '12 at 15:47
  • BTW, what do you use as a backing device? Do you have LVM+DRBD? What does wo: in /proc/drbd say? Does xenstore-ls say feature-barriers is 1? – Halfgaar May 30 '12 at 15:50
  • I use LVM as backing devices, `wo:b` and `feature-barrier = 1`. Never had a problem so far. All my RAID controllers are battery backed. – Oliver May 30 '12 at 15:57
  • What kernels do you use on Dom0 and DomU? – Halfgaar May 30 '12 at 16:06
  • I use always the same kernel in the Dom0 and DomU. Mostly 2.6.32-5-xen-amd64 from Debian Squeeze, but I have also some older machines running Lenny. Also note that I don't use pygrub but I specify the kernel and initrd in the DomU .cfg. – Oliver May 30 '12 at 16:09
  • Judging from the links I posted, you are going to have trouble with this at some point. – Halfgaar May 31 '12 at 12:43
  • I'm going to check this out, thanks. I have basically the same setup (with changing versions) running for at least 5 years now on a few dozen DRBD clusters. Never had a problem so far. – Oliver May 31 '12 at 13:29
2

Add extra = " barrier=off" to your DomU configuration. Note the space before barrier.

Also add the corresponding barrier/off option (according to the filesystems mount-options) in the /etc/fstab of your DomUs.

Update:

The barrier/off option is a second measure to make sure barriers are off.

As for barrier ops: As you can see during startup these operations fail. So turning it off does not make things worse. Apart from that these barriers only really make sense on harddisks with write-cache enabled that is not written back to disk on power failures.

A server should have a battery-backed UPS as well as a battery backed raid-controller. So turning barriers on will only cost performance (even if it would work).

Nils
  • 7,657
  • 3
  • 31
  • 71
  • I didn't know that was a DomU config option, thanks. I have two questions: 1, why is it still necessary to turn off barrier in the fstab of the domu? Won't it default to something else? And 2, when not using barrier (both by specifying it in the domU xen cfg or fstab), is your data at risk, or does it use another cache ordering system? – Halfgaar Jul 17 '12 at 07:56
  • @Halfgaar `extra` passes additional kernel-options to the DomU. I will answer the other two via update of my answer. – Nils Jul 17 '12 at 15:28
  • Well, one of the events drbd protects against is server failure. In that case, the battery-backed cache may not be written to disk, especially when the controller fails. So I'd like to have some sort of write-cache reordering. Also, I'm confused about the whole situation. What exactly is the bug? One of the xen block drivers (front or back) advertises that it can do barriers but it can't? – Halfgaar Jul 18 '12 at 11:02
  • @Halfgaar I think the front one can`t do barriers and tries to force the back one into barrier-mode. Although the back one says "I can't" he switches on part of the barrier-mode-stuff... I've got a fix for SLES10 SP4 which corrects the behaviour of the DomU (but only if you pass "barrier=off"). – Nils Jul 18 '12 at 15:32
  • @Halfgaar with regards to DRBD: if you run DRBD in protocol mode C (as you apparently do) you are on the safe side. Lokal writes will only be done after they have been done on the remote side. So even if your local primary dies the data will be written on the secondary (if it survives). – Nils Jul 19 '12 at 20:05
  • Except that in my experience, machines start dying chaotically/intermittently. Hence the STONITH in a heartbeat setup. – Halfgaar Jul 23 '12 at 10:28