16

With btrfs hitting production in Oracle EL 14th this month (together with working fsck and scrubbing from Linux 3.2) I was thinking of redesigning my current backup solution to utilise it. Note that I'm thinking about doing it for small amounts of data, less than 10TB, that's fairly static (less than 1% changed daily). In short a SMB/SOHO backup solution.

What the backup should do:

  1. do a LVM snapshot of ext[234]/XFS/JFS on the production server
  2. rsync/transfer changed data to btrfs on backup server
  3. snapshot the btrfs filesystem
  4. drop old snapshots when free space is running low

Pros:

  • All files easily available, no decompression or loop mounting needed
  • Past snapshots also easily available...
  • ... so I can share them as read-only Samba shares (with shadow copy support)
  • Snapshots take minimal amount of space thanks to copy-on-write (snapshot without changes takes literally few KiB on disk)
  • High backup consistency: checksums on files, scrubbing of all data and built-in redundancy

Questions:

  • Is there some backup solution (in form of Bacula, BackupPC, etc.) that is, or can be easily made, aware of copy-on-write file system?
  • Or will I need to use in-home rsync solution?
  • What do people with ZFS boxes dedicated for backup do to backup their Linux machines?
Hubert Kario
  • 6,351
  • 6
  • 33
  • 65
  • Can't see ``cons`` ! One of them would be that Btrfs snapshots are only equivalent to incremental backups (no physical copy per backup of your file on the disc). Which could be of importance when facing disk surface issues. Note that you can force one duplication with native RAID1 support included in Btrfs. – vaab Oct 31 '12 at 08:22
  • 1
    @vaab: that's a `pro` -- more than two copies are not really needed if you've got checksums and actively scrub the FS, three will probably come with RAID6 support. As I've said, it's a setup for dedicated backup system, not "backup" copies inside the FS on single computer. That would be "RAID is not backup" and "snapshots are not backup". `cp -a` and `rsync` are for that... – Hubert Kario Oct 31 '12 at 09:38
  • I'm also considering backing up to btrfs, but I was just thinking of `rsync -a --delete /home/user /mnt/butterfs/backups/ && snapper create` – apart from creating a snapshot after backing up, what do you mean by COW-aware? – unhammer Jan 13 '13 at 14:25
  • 1
    @unhammer: using `rsync` without `--inplace` you'll get multiple copies of the same data in the remote file system. (rsync normally copies data to a temporary hidden file and then moves it over the old file, with a Copy-On-Write file system you get two copies on unchanged data this way) – Hubert Kario Jan 13 '13 at 15:39

5 Answers5

5

I've done some extensive searching in the last week for something similar. I have found no solutions to do all 4 steps. There are numerous blogs from home users who try the 'rsync to btrfs'-type of backups, and all of the major Btrfs wikis cover how to perform Btrfs snapshots.

There are also quite a few people who are attempting different ways of rotating Btrfs snapshots. However, you are the first person I've seen who wants to rotate snapshots based on disk space. I am playing with btrfs-snap myself which creates a set of hourly, weekly and monthly snapshots, and it's nice and simple.

The Dirvish project seems to meet many of your requirements. Some developers are attempting to integrate Dirvish with Btrfs. However, the Dirvish project seems a bit stalled.

At this point in time, you are ahead of the curve.

Stefan Lasiewski
  • 22,949
  • 38
  • 129
  • 184
  • Well, I just want a backup solution as pain free as BackupPC: when disk space is low, it just deletes old data (old snapshots). While I was afraid that I am ahead of the curve, it's not like ZFS hasn't been with us for the past few years... – Hubert Kario Feb 03 '12 at 23:40
3

According to Avi Miller (his talk during LinuxConf.AU) a btrfs send/receive is being worked on. It'll be faster than rsync since it doesn't need to traverse through directories to find changes in files.. I don't know if there's an expected release date yet though.

There is, however, a utility built into btrfs-progs that lists every file that has changed between snapshots/etc.. btrfs subvolume find-new

borring
  • 41
  • 1
2

I am working on a OS backup system similar to BackupPC. I have thought about this. What has been stopping me from actually implementing that is that you cannot hardlink between subvolumes. You can also only create snapshots of subvolumes -> One subvolume per backup client. Thus the file level deduplication feature cannot coexist with this approach. And that file level deduplication usually saves a lot of space. Do you want to back up only one server?

If btrfs had block level deduplication this problem can be probably avoided, but that is usually unsufferably slow as well...

Then such an approach would of course entail a tight integration with one filesystem (btrfs), so this should be an optional feature.

I'm asking because I'm thinking about adding such a cow feature, but do not know if I should because of the drawbacks listed above.

Edit: UrBackup supports backups as descibed in the question now with Linux kernels >=3.6 (with cross volume reflink support). See how to set it up.

UrOni
  • 235
  • 1
  • 4
  • 1
    cross-subvolume reflink copy (a semi-hardlink done by `cp --reflink`) is either already implemented or will be implemented in near future. Online de-duplication in FS is either slow (lessfs) or needs huge amounts of RAM (ZFS) so depending on it would *really* be a bad feature in backup software. Either way, btrfs-oriented backup software will have a big audience, it's supposed to be the next ext3 after all. – Hubert Kario Feb 13 '12 at 22:53
  • One more thing: you can work-around this problem by keeping all servers in one subvolume -- you can reflink copy between them (to dedupe) while preserving snapshot capability. You just have to snapshot after you dedupe, you can still snapshot after backuping only a single server! The backups won't take more space if you do the backups one at a time. Alternatively you can backup all servers, dedupe and only then snapshot. This way you can backup few servers at the same time. – Hubert Kario Feb 13 '12 at 23:03
  • You're right. Didn't think of that. For convenience you can then symlink to the right snapshots in another volume. I did also see a patch for cross-volume hardlink (or --reflink) but it did not look like it made it/or will make it to mainline. I'll really look into that! Now you probably do your backups over ssh. My project is specialized for local networks... (auto discovery and so on) – UrOni Feb 14 '12 at 12:57
  • Yes, the patch is alive and working, unfortunately not in mainline, I don't know why. I'm trying to bug Chris Mason about it. As for your project, feel free to drop me a line, I'll gladly beta-test it (time permitting). It sure sounds interesting. – Hubert Kario Feb 14 '12 at 23:51
  • Finally that patch landed in the mainline Linux kernel 3.6. With the cross-device reflink it actually wasn't that much work. I have written here about it: http://urbackup.org/blog/?p=83 The code is in the "next" branch in the git repository. I'm currently testing it. – UrOni Nov 02 '12 at 21:39
1

The btrfs wiki page "Use Cases" lists some tools: SnapBtr, Snapper, btrfs-time-machine, UrBackup.

There's a proposal for a built-in tool called autosnap:

Using the autosnap feature, you could configure btrfs to take regular or event based snapshots and further manage the snapshots automatically.

Autosnap is not just about taking the snapshot, but also managing the created snapshots, as of now you could configure autosnap to delete the snapshots based on filesystem used space.

However, as of October 2013, the wiki states that "The autosnap functionality is currently not included in upstream version of btrfs."

ignis
  • 411
  • 3
  • 4
1

I had similar frustrations, so I ended up creating a few scripts which I'm calling snazzer. Together they offer snapshotting, pruning, measurement and transport via ssh (but as of today can send/receive to/from local filesystems as well). Measurements are just reports of sha512sum and PGP signatures of snapshot paths. It's not quite ready for release but I would love to hear feedback if anybody has time to review it at this early stage.

CLI-only at this point, but I've taken some time to make it easy to use on systems with many btrfs subvolumes - typically I have separate subvolumes for /var/cache, /home, etc. which may need to be excluded from snapshotting or have more/less aggressive pruning schedules.

I'm afraid the pruning algorithm purely makes decisions on the presence of the set of snapshots and their dates, nothing is there to keep pruning until a disk usage constraint is met - which do you delete first? Reduce the number of hourlies first, or daylies? Perhaps drop the oldest, Eg. yearlies? Different deployments will have different priorities; and I can't know if this is the only backup tier (in which case you shouldn't drop oldest backups in case of legal/insurance obligations), or just an intermediate one (in which case you probably have those yearlies archived somewhere safe elsewhere).

I'll be adding ZFS support and/or interoperability at some point; it's written mostly in posix-ish shell and perl due to a strong desire for "zero" dependencies at the moment, I'll hopefully have a cleaner python alternate implementation maintained in parallel at some point.

csirac2
  • 11
  • 3
  • unless your FS has very large and often changes, there's very little difference between keeping a snapshot from one month ago, and only 1 per day from last week compared to one per day for the whole month -- btrfs will need to store the difference between current state and the one from month ago anyway -- I keep just dailies, but because its compressed and diffed I can keep them for half a year back easily - then dropping the oldest guarantees to free at least *some* space – Hubert Kario Apr 19 '15 at 19:18
  • Well, I have a non-trivial number of VMs to keep track of - some with large transient files (i.e. snapshots with unique extents) which as you've suggested can benefit from pruning of intermediate snapshots. So whilst it's true that pruning intermediates doesn't free as much disk as dropping the oldest, what can I say... keeping only the minimum number of snapshots around and doing so with COW filesystem like btrfs seems to be about as efficient as it gets, but I realize there's more to picking an appropriate solution than that :) – csirac2 Apr 20 '15 at 05:56
  • @csirac2 are you maintainnig snazzer? I'm looking for this type of solution. I'm interested in snazzer if it is being actively maintained. GitHub doesn't seem to show recent activity... – MountainX Jun 07 '16 at 02:04
  • @MountainX When I didn't get much initial feedback on snazzer, I kind of lost enthusiasm.. When I started writing it, there was really only OpenSUSE's snapper and a handful of shell/python scripts floating around for automating btrfs. By the time I got around to sharing it with the world, lots of other options have popped up, and I'd say btrbk seems to have a lot of momentum (lack of automated testing [maybe fixed now?] was concerning though). If I had to do it all again I probably would've collaborated with the sanoid author to add btrfs compatibility there. Interested to hear your thoughts. – csirac2 Jun 11 '16 at 11:31