Does rsync --inplace write to the entire file, or just to the parts that need to be updated? (for btrfs+rsync backups)

21

11

I was reading several guides how combine btrfs snapshots with rsync to make an efficient backup solution with history. However it all depends on if rsync --inplace modifies only those portions of files that actually changed, or if it overwrites the whole file sequentially. If it writes the whole file then it seems that btrfs will always create a new copy of the file, which would make the idea much less efficient.

Petr Pudlák

Posted 2013-03-31T13:42:03.410

Reputation: 2 197

@PetrPudlák id does not "read" the file, this would be inefficient. It separates the files in chunks, applies a quick hash compares the hashes and transmits what's different. There is also a second more in depth comparision and the server keeps track of the chuncks, but it's not a real "reading" as in loading the whole thing into memory: https://rsync.samba.org/~tridge/phd_thesis.pdf

So, as Gunther Piez commented, it does know exactly what to copy.

– runlevel0 – 2020-02-14T12:38:23.397

How would it even know if it can avoid writing to the entire file? Doesn't it need to read the entire file first, to figure out what has changed? – user541686 – 2013-04-01T03:16:08.680

2@Mehrdad yes, it does, but reading the whole isn't a problem. If rsync reads the whole file and then seeks to and updates only those parts that are needed, btrfs will copy only these updated blocks. But if rsync reads and writes the whole file, then it'll be a problem. – Petr Pudlák – 2013-04-01T14:36:49.447

2@Mehrdad rsync does not only know that it may avoid writing the entire file, it manages to do so without copying it completely over the net. Clever little program. – Gunther Piez – 2013-05-02T11:24:43.800

Answers

31

If you pass rsync two local paths, it will default to using "--whole-file", and not delta-transfer. So, what you're looking for is "--no-whole-file". You also get delta-transfer if you requested '-c'.

Here's how you can verify:

$ mkdir a b
$ dd if=/dev/zero of=a/1 bs=1k count=64
$ dd if=/dev/zero of=a/2 bs=1k count=64
$ dd if=/dev/zero of=a/3 bs=1k count=64
$ rsync -av a/ b/
sending incremental file list
./
1
2
3

sent 196831 bytes  received 72 bytes  393806.00 bytes/sec
total size is 196608  speedup is 1.00

Then touch a file and re-sync

$ touch a/1
$ rsync -av --inplace a/ b/
sending incremental file list
1

sent 65662 bytes  received 31 bytes  131386.00 bytes/sec
total size is 196608  speedup is 2.99

You can verify it re-used the inode with "ls -li", but notice it sent a whole 64K bytes. Try again with --no-whole-file

$ touch a/1
$ rsync -av --inplace --no-whole-file a/ b/
sending incremental file list
1

sent 494 bytes  received 595 bytes  2178.00 bytes/sec
total size is 196608  speedup is 180.54

Now you've only sent 494 bytes. You could use strace to further verify if any of the file was written, but this shows it at least used delta-transfer.

Note (see comments) that for local filesystems, --whole-file is assumed (see the man page for rsync). On the other hand, across a network --no-whole-file is assumed, so --inplace on its own will behave as --inplace --no-whole-file.

dataless

Posted 2013-03-31T13:42:03.410

Reputation: 493

Why doesn't --inplace imply --no-whole-file? – Geremia – 2016-09-08T18:13:50.433

Isn't --no-whole-file default anyways? – Geremia – 2016-09-08T18:24:33.363

2@Geremia not if both paths are local. And my example shows that --inplace does not imply --no-whole-file for the version of rsync I was using in 2013, but you're welcome to repeat this experiment with your own version of rsync. – dataless – 2016-09-09T17:28:53.023

Well, inplace is not about ‚scanning for same/differing blocks’, it's just about overwriting the existing file right away, from offset 0. (otherweise a temporary copy is built up, and only then the old target file delete and the tempopary copy renamed. It's probably deemed “safer” to keep the old file as long as possible, if the process gets interrupted. Of course this is worse for performance, peak storage consumption (think large files), possibly fragmentation...)... – Frank Nocke – 2017-02-27T09:47:36.453

1I would assume, that it's other way round, --no-whole-file always implies --inplace, otherwise most of its performance gain would be gone. Couldn't find this documented, though... – Frank Nocke – 2017-02-27T09:48:28.843

When I try it across two different btrfs filesystems on the same machine, I find that --no-whole-file and --inplace --no-whole-file transfer as quickly, but the former uses a new inode whereas the latter does not. --inplace on its own uses the same inode, but copies a whole 64kB. – Diagon – 2017-12-10T03:15:44.693

When I try it across a network from a btrfs filesystem to a remote xfs filesystem, all three options copy only a couple of hundred bytes, and --no-whole-file changes the inode, but the other two options that include --inplace do not. – Diagon – 2017-12-10T03:28:56.557

This answer is not related to block level change tracking in snapshots, as per the question – Patrick – 2019-06-19T02:07:52.650

15

Here the definite answer I guess, citing the correct part of the manual:

   --inplace

          [...]

          This option is useful for transferring large files
          with  block-based  changes  or  appended data, and
          also on systems that are disk bound,  not  network
          bound.   It  can  also  help  keep a copy-on-write
                                               *************
          filesystem snapshot from diverging the entire con‐
          *******************
          tents of a file that only has minor changes.

fuujuhi

Posted 2013-03-31T13:42:03.410

Reputation: 161

4

--inplace overwrites only regions that have changed. Always use it when writing to Btrfs.

Gabriel

Posted 2013-03-31T13:42:03.410

Reputation: 457

Does the same apply to ZFS? – ewwhite – 2014-07-04T06:39:02.397

@ewwhite: Since ZFS is COW (copy-on-write) like BTRFS, then yes. – Geremia – 2015-03-02T20:45:40.880

@PetrPudlák -vvv shows it skipping matched blocks – Tom Hale – 2019-10-06T10:33:36.707

And do you have an evidence that shows it doesn't overwrite other parts of files? – Petr Pudlák – 2013-10-25T13:00:07.640

3

rsync's delta transfer algorithm deals with whether the entire file is transmitted or just the parts that differ. This is the default behavior when rsyncing a file between two machines to save on bandwidth. This can be overriden with the --whole-file (or -W) to force rsync to transmit the entire file.

--inplace deals with whether rsync, during the transfer, will create a temporary file or not. The default behavior is to create a temporary file. This gives a measure of safety in that if the transfer is interrupted, the existing file in the destination machine remains intact/untouched. --inplace overrides this behavior and tells rsync to update the existing file directly. With this, you run the risk of having an inconsistent file in the destination machine if the transfer is interrupted.

Mike T.

Posted 2013-03-31T13:42:03.410

Reputation: 31

2

From the man page:

This  option  changes  how  rsync transfers a file when its data
needs to be updated: instead of the default method of creating a
new  copy  of  the file and moving it into place when it is com-
plete, rsync instead writes the updated  data  directly  to  the
destination file.

This leads me to believe that it writes over the file in its entirety-- I imagine it would be near impossible for rsync to work any other way.

Laxsnor

Posted 2013-03-31T13:42:03.410

Reputation: 151

2

After determining what parts need update, it could just seek to those parts and update them, instead of writing the entire file.

– Petr Pudlák – 2013-04-01T14:39:13.377

0

The theoretical work on in-place rsync is described in this paper.

Paper reference: D. Rasch and R. Burns. In-Place Rsync: File Synchronization for Mobile and Wireless Devices. USENIX Annual Technical Conference, FREENIX track, 91-100, USENIX, 2003.

From the link:

... We modified the existing rsync implementation to support in-place reconstruction.

Abstract: [...] We have modified rsync so that it operates on space constrained devices. Files on the target host are updated in the same storage the current version of the file occupies. Space-constrained devices cannot use traditional rsync because it requires memory or storage for both the old and new version of the file. Examples include synchronizing files on cellular phones and handheld PCs, which have small memories. The in-place rsync algorithm encodes the compressed representation of a file in a graph, which is then topologically sorted to achieve the in-place property. [...]

So this appears to be the technical details of what rsync --inplace is doing. According to the beginning of the paper:

We have modified rsync so that it performs file synchronization tasks with in-place reconstruction. [...] Instead of using temporary space, the changes to the target file take place in the space already occupied by the current version. This tool can be used to synchronize devices where space is limited.

As becomes clear from @dataless's answer, this implies that --inplace is using the same storage space, but it may still copy the whole file into that space. Specifically, when copies are made from/to local filesystems, rsync assumes the --whole-file option. But when it is across networked systems on the other hand, it assumes the --no-whole-file option.

user92979

Posted 2013-03-31T13:42:03.410

Reputation: 131

1Um, so what's the answer? – Xen2050 – 2017-01-18T15:09:28.580

My apologies. I wasn't paying sufficient attention. With @dataless's answer, this should clear things up. – Diagon – 2017-12-10T03:53:20.560