Why is moving files between btrfs subvolumes an expensive operation?

7

5

From what I understand, btrfs subvolumes share the same file system "storage", so I was surprised to know that moving files between different subvolumes is an expensive operation, like moving between different filesystems (copy + delete).

I was especially surprised when someone suggested this work-around: reflink-copy files between subvolumes, then delete the original ones. This is said to be a cheap operation (moving around metadata only). How is that different subvolumes can share data blocks when using COW, but not in the should-be easier operation of moving data?

m.alessandrini

Posted 2015-08-01T08:47:18.283

Reputation: 85

4

This doesn't answer your question, but you may be interested to know that from coreutils 8.24 (released July 3, 2015) onward, mv will try a reflink before falling back to a standard copy (changelog).

– Vincent Yu – 2015-08-01T10:10:06.473

@VincentYu thanks for the information. But shouldn't a move/rename operation be managed by the filesystem itself, instead of user space utils? From what I read, this move behaviour is the same even if done inside the same mount point, i.e. not having the subvolume mounted in another directory so the kernel could not perhaps recognize it's the same file system. – m.alessandrini – 2015-08-01T11:20:45.603

1@VincentYu just for information, and awakened by the new answer, I tried moving a large file between subvolumes with more up-to-date system (debian testing, kernel 4.6.0, coreutils 8.25-2), but nothing changed. – m.alessandrini – 2016-07-12T07:50:33.900

@m.alessandrini I just found that in order to be able to use cp --reflink between two subvolumes I was forced to mount the top level subvolume and issue the cp command inside that namespace, otherwise cp would exit with an error saying ...Invalid cross-device link – Dzamo Norton – 2018-05-29T12:33:56.147

@DzamoNorton the fact is that previously it did not work even in the configuration you describe (all subdirs of current dir), where reflink-copy worked. But I just tried and today the move is a zero-time operation, too (kernel 4.16), so I guess this has been addressed. – m.alessandrini – 2018-05-30T19:09:41.280

Answers

4

How is that different subvolumes can share data blocks when using COW, but not in the should-be easier operation of moving data?

mv uses the rename syscall to attempt the move. btrfs's kernel rename impl detects the cross subvolumes move and explicitly disallows this (even if under the same mount point):

/* we only allow rename subvolume link between subvolumes */
if (old_ino != BTRFS_FIRST_FREE_OBJECTID && root != dest)
    return -EXDEV;

This probably has to do with subvolume inode accounting and the code paths these operations take. The reflink-copy is actually creating new metadata (but the data itself is CoW) accounted in the new subvolume. In theory they probably could make rename "move" the metadata by doing something similar to what copy --reflink followed by rm source does... simply no one has taken the effort to do it.

Will Brown

Posted 2015-08-01T08:47:18.283

Reputation: 156

Thanks, I browsed the kernel code for curiosity, and apart not understanding it at a first glance (obviously), I really don't understand that sentence in the comment. Do you know what it may mean? – m.alessandrini – 2016-07-12T07:46:34.790

2Subvolumes appear as directory entries (aka "subvolume link") anywhere in the filesystem. The comment is saying you can only rename (aka move) the subvolume directory entries between subvolumes and not actual dirs/files. – Will Brown – 2016-07-13T05:44:16.207