5

I have several boxes running Debian 8, dovecot and btrfs. I'm using btrfs snapshots for short term backup. For this purpose I keep 14 snapshots of the mail subvolume.

Performance is OK until it comes to snapshot removal: as soon as btrfs-cleaner kicks in, everything is almost halted. This goes up to drbd losing connectivity to the secondary node due to timeout. This happens on several boxes so it's unlikely to be a hardware related issue.

Spike is where snapshot removal takes place: collectd load stats

I cannot believe that this is normal behaviour. So my question is: has anyone experience with this issue, any idea on how to solve or debug it, or as last resort how to avoid it by doing things differently?

Systems are Dell R710, Debian 8, Kernel 3.16, Mount options: rw,noatime,nossd,space_cache

Edit: More system information

Dual R710, 24GB RAM, H700 w/ writecache, 8x1TB 7.2k Sata disks as RAID6, DRBD protocol B, dedicated 1Gb/s link for DRBD

Edit: Removing Snapshot content through rm -rf. Throttled for IO, otherwise it would have run away like the btrfs-cleaner did:

collectd load stats

I would conclude that this is way worse io-wise. The only advantage is that I can control the IO load of the userspace rm.

And another edit: Iops massacree

collectd iops stats

tim
  • 1,197
  • 3
  • 10
  • 23

2 Answers2

5

In the CoW world (BTRFS and ZFS, basically), removing a snapshot/subvolume require many "heavy" metadata operations, which imply many head seeks. This is due to the filesystem parsing its own structures in order to determine the block exclusively used by the offending snapshot. This, in turn, can bring a system to it knees.

To confirm this is the problem, do that:

  • open two terminal with screen
  • on the first terminal, run iostat -x -k 1
  • on the second terminal, remove the snapshot
  • during the remove, check the first terminal: you will probably find your disks with 100% occupation, reading many, many small data blocks.

If the issue is confirmed, you can try to first delete the snapshot content (with a simple rm), then remove the snapshot itself.

As a side note: while CoW filesystems are extremely flexibles, they are not architected for pure performance. And while ZFS remain quite fast, the same thing can not be said for BTRFS.

Anyway, large subvolumes removal was problematic for ZFS also (until it was implemented a background-running delete process...)

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • I see. Unfortunately, it seems that the snapshot removal, while quite fast, basically halts any other disk activity. I can only suggest you to pose the same question on the BTRFS mailing list. – shodanshok Mar 24 '16 at 18:13
  • Thank you - fascinating and sad that this seems to have a big and mysterious effect after a `docker rmi ` command, when btrfs is the storage driver. After I removed about 20 GB of images, it took over half an hour for the space to be reclaimed, by btrfs-cleaner I guess. – nealmcb Apr 12 '16 at 20:15
  • @nealmcb For what I know, this is expected behavior: space reclaim in btrfs is a somewhat complex operation. Have you tried to mount your btrfs filesystem with the `space_cache` option? – shodanshok Apr 13 '16 at 08:22
  • I haven't played with the mount options. I looked space_cache up in the man page at https://btrfs.wiki.kernel.org/index.php/Mount_options but I'm still not clear why that might help, or what the downside might be. Now that I understand the delay, I'm not too worried since I do this so rarely. I mainly thought I'd provide some docker keywords here to help others find this page and understand the issues. – nealmcb Apr 13 '16 at 22:30
2

It looks like a little-known bug in btrfs quota feature.
Just disable btrfs quotas by the next command.
btrfs quota disable /

UPD: I found detailed analysis of the problem. It's not a bug, but rather a feature.

eugene-bright
  • 213
  • 1
  • 7