128

Can anyone clarify the differences between the --checksum and --ignore-times options of rsync?

My understanding is as follows:

--checksum
If the file size and time match, it will do a checksum at both ends to see if the files are really identical.

--ignore-times
'Transfer' every file, regardless of whether file time is same at both ends. Since it will still use the delta-transfer algorithm, if a file actually is identical, nothing gets transferred.

That's the technical difference, but as far as I can tell, they are semantically the same thing.

So, what I'm wondering is:

  • What is the practical difference between the two options?
  • In what cases would you use one rather than the other?
  • Is there any performance difference between them?
Andy Madge
  • 1,547
  • 2
  • 12
  • 14
  • checksum ignores mod-time and size because it's doing a checksum of files on both ends. This is slow but is probably the most reliable way to make sure files have/have-not changed. If systems on both ends have a time sync or file-format difference, you can end up re-syncing the same files over and over. (windows->linux->windows for example) – Scott Aug 26 '22 at 18:15

5 Answers5

135

Normally, rsync skips files when the files have identical sizes and times on the source and destination sides. This is a heuristic which is usually a good idea, as it prevents rsync from having to examine the contents of files that are very likely identical on the source and destination sides.

--ignore-times tells rsync to turn off the file-times-and-sizes heuristic, and thus unconditionally transfer ALL files from source to destination. rsync will then proceed to read every file on the source side, since it will need to either use its delta-transfer algorithm, or simply send every file in its entirety, depending on whether the --whole-file option was specified.

--checksum also modifies the file-times-and-sizes heuristic, but here it ignores times and examines only sizes. Files on the source and destination sides that differ in size are transferred, since they are obviously different. Files with the same size are checksummed (with MD5 in rsync version 3.0.0+, or with MD4 in earlier versions), and those found to have differing sums are also transferred.

In cases where the source and destination sides are mostly the same, --checksum will result in most files being checksummed on both sides. This could take long time, but the upshot is that the barest minimum of data will actually be transferred over the wire, especially if the delta-transfer algorithm is used. Of course, this is only a win if you have very slow networks, and/or very fast CPU.

--ignore-times, on the other hand, will send more data over the network, and it will cause all source files to be read, but at least it will not impose the additional burden of computing many cryptographically-strong hashsums on the source and destination CPUs. I would expect this option to perform better than --checksum when your networks are fast, and/or your CPU relatively slow.

I think I would only ever use --checksum or --ignore-times if I were transferring files to a destination where it was suspected that the contents of some files were corrupted, but whose modification times were not changed. I can't really think of any other good reason to use either option, although there are probably other use-cases.

Steven Monday
  • 13,019
  • 4
  • 35
  • 45
  • 15
    I've found `--checksum` useful along with `--itemize-changes` for verifying backups. Every now and again my backup scripts run a full compare this way after the current daily/weekly updates are complete. I get dropped an email marked urgent if `--itemize-changes` outputs anything unexpected, so I know there is a potential problem I should look into. – David Spillett Sep 04 '12 at 09:58
  • 15
    --checksum is useful when working in Git and switching between branches with changed files, which keeps changing the update times on files that you don't intend to send from a particular branch. – FriendlyDev Apr 20 '15 at 10:26
  • 7
    `--ignore-times` and especially `--checksum` are necessary if one of your "files" is a Truecrypt file container since by default the timestamp of the file is not updated. See https://productforums.google.com/forum/#!topic/drive/gnmDp3UXEgs and http://ask-leo.com/why_wont_my_truecrypt_volume_backup.html – Marcus Junius Brutus Nov 12 '16 at 21:47
  • Note: I did a quick experiment, and ctime is not compared, only mtime. On Mac, at least. This can be useful to know. It's why I have so many issues with Windows file systems, which report the same time (ctime) for atime, mtime, and ctime. – Edward Falk Jan 10 '17 at 21:57
  • Does `--checksum` checksum only the source filenames on the destination machine or all files in the destination directory? – Greg Oct 29 '17 at 12:55
  • 1
    @DavidSpillett - How effective are `--checksum` and `--itemsize-changes` in validating backups? For example, do these flags validate if data has been corrupted as a result of a bad sector or a write failure? – Motivated Jan 03 '20 at 18:36
22

checksum is also useful if you have been using another system to sync files, that has not preserved timestamps. Checksum will only tranfer files that are different AND update all the timestamps on the receiving end so that they match

Paulus
  • 351
  • 2
  • 4
  • Will it not also do that if you do not provide the `--checksum` flag? – lucidbrot Jan 26 '20 at 18:22
  • 1
    Yes it would update the timestamps, but also possibly transfer many unnecessary files. The checksum is useful if you are running the rsync daemon on the other end, and have a very slow connection, and many files (multi gig source tree) – Paulus Jan 27 '20 at 13:14
  • Thanks! Excuse the further question: What would you recommend if I have source files that each are >1GiB and mediocre connection speed, and some newer timestamps are still exactly the same file? `-c` would compute all the checksums (right?) - Ideally it'd only compute the checksum for the files where the timestamps differ. Or does it do checksum checks on those files in normal mode (without the `-c` flag)? – lucidbrot Jan 27 '20 at 16:34
6

One detail: the checksum option checks a whole file on one end, then the whole file on the other end. If your files are somewhat big, this kind of kills parallelism.

Also, if you have huge files, you are most likely to run into a timeout with --checksum, as you don't with -I.

slm
  • 7,355
  • 16
  • 54
  • 72
Francois
  • 61
  • 1
  • 1
4

From info rsync in regards to the --checksum option - "Since this whole-file checksumming of all files on both sides of the connection occurs in addition to the automatic checksum verifications that occur during a file's transfer, this option can be quite slow."

LeoB
  • 176
  • 6
  • 1
    That sentence doesn't seem to be in my man pages... so, does that imply that the checksum option will use checksums to identify whether the files are identical, and if they're not it will transfer, thus generating checksums again as part of the transfer? The --ignore-times option just skips the check and assumes they've changed? Therefore performance-wise --ignore-times is a better way of achieving the same thing? I'm still struggling to see why there are 2 different options (apart from the fact that --checksum is more transparent) – Andy Madge Dec 09 '10 at 22:05
  • You should look at the latest documentation edit: http://gitweb.samba.org/?p=rsync.git;a=blobdiff;f=rsync.yo;h=e9d1e20d2230f28e67472bb6cf853c5e8bcc4c84;hp=55c372a8435d6b11336c4be7c7aa7484459b43e5;hb=c64ff141b83dfb2bba32079db8309df176988388;hpb=1b896f8d1e5ba86bfcadf9ef68ad1453e48e5fb9 – Aleksandr Levchuk Dec 09 '10 at 23:02
3

The --ignore-times option will probably result in all files delta encoded and the delta-transfer algorithm (delta encoding) is at least as slow as the checksumming.

I don't know if rsync --ignore-times is smart enough to avoid the "automatic after-the-transfer verification" in the frequent case when the delta-transfer will result in nothing being transfered.

For --ignore-times:

  • In case rsync is not smart (or does not trust the delta encoding) then the checking (checksumming and encoding) will be done twice.
  • It could also be the case that delta encoding is much slower than the 128-bit MD4 checksumming.

Both --checksum and --ignore-times will be "quite slow" but --ignore-times is likely even slower (due to the 2 possibilities above).

Good question - please post if you find any performance differences in practice.

Aleksandr Levchuk
  • 2,415
  • 3
  • 21
  • 41