60

The more I use rsync the more I realise that it's a swiss army knife of file transfer. There are so many options. I recently found out that you can go --remove-source-files and it'll delete a file from the source when it's been copied, which makes it a bit more of a move, rather than copy programme. :)

What are you favorite little rsync tips and tricks?

Amandasaurus
  • 30,211
  • 62
  • 184
  • 246

18 Answers18

22

Try to use rsync version 3 if you have to sync many files! V3 builds its file list incrementally and is much faster and uses less memory than version 2.

Depending on your platform this can make quite a difference. On OSX version 2.6.3 would take more than one hour or crash trying to build an index of 5 million files while the version 3.0.2 I compiled started copying right away.

robcast
  • 533
  • 3
  • 8
  • One thing to note there is that if you use some options (like `--delete-before` for instance) the old "build list first" behaviour is used as it is required for these options to work correctly - so if you don't see this behaviour check if the other options you are using are known to stop it being possible. This can be useful if you are using rsync interactively on a large tree and want to force the initial scan so the output of `--progress` is accurate (i.e. the "objects to compare" count will never rise as no new objects will be found after the initial scan). – David Spillett Aug 03 '12 at 14:33
21

Using --link-dest to create space-efficient snapshot based backups, whereby you appear to have multiple complete copies of the backedup data (one for each backup run) but files that don't change between runs are hard-linked instead of creating new copies saving space.

(actually, I still use the rysnc-followed-by-cp -al method which achieves the same thing, see http://www.mikerubel.org/computers/rsync_snapshots/ for an oldish-but-still-very-good run down of both techniques and related issues)

The one major disadvantage of this technique is that if a file is corrupted due to disk error it is just as corrupt in all snapshots that link to that file, but I have offline backups too which would protect against this to a decent extent. The other thing to look out for is that your filesystem has enough inodes or you'll run out of them before you actually run out of disk space (though I've never had a problem with the ext2/3 defaults).

Also, never forget the very very useful --dry-run for a little healthy paranoia, especially when you are using the --delete* options.

David Spillett
  • 22,534
  • 42
  • 66
  • 1
    Note that -n is the shortcut for --dry-run – ctennis Aug 23 '09 at 15:01
  • 3
    I prefer to stick with the long names, especially in scripts that others may end up maintaining. It makes it clearer what is intended without reference to the docs. – David Spillett Aug 24 '09 at 12:01
  • +1 I implemented a backup solution of many TB over many machines with the --link-dest method for hard-linked snapshots as described above - it worked perfectly. – matja Mar 06 '10 at 11:14
  • If you like --link-dest backups, check out [Dirvish](http://www.dirvish.org/) which uses rsync under the hood – hfs Aug 02 '12 at 08:13
  • Oh yikes, I'd never even considered that. After one year of daily link-dest, my backup drive is at 26% used inodes (ext4 with default options). – Izkata Oct 13 '17 at 04:25
16

If you need to update a website with some huge files over a slowish link, you can transfer the small files this way:

rsync -a --max-size=100K /var/www/ there:/var/www/

then do this for the big files:

rsync -a --min-size=100K --bwlimit=100 /var/www/ there:/var/www/

rsync has lots of options that are handy for websites. Unfortunately, it does not have a built-in way of detecting simultaneous updates, so you have to add logic to cron scripts to avoid overlapping writes of huge files.

Bob
  • 940
  • 5
  • 7
12
--time-limit

When this option is used rsync will stop after T minutes and exit. I think this option is useful when rsyncing a large amount of data during the night (non-busy hours), and then stopping when it is time for people to start using the network, during the day (busy hours).

--stop-at=y-m-dTh:m

This option allows you to specify at what time to stop rsync.

Batch Mode

Batch mode can be used to apply the same set of updates to many identical systems.

jftuga
  • 5,572
  • 4
  • 39
  • 50
12

I use the --existing option when trying to keep a small subset of files from one directory synced to another location.

TCampbell
  • 2,014
  • 14
  • 14
9

--rsh is mine.

I've used it to change the cipher on ssh to something faster (--rsh="ssh -c arcfour") also to set up a chain of sshs (recommend using it with ssh-agent) to sync files between hosts that can not talk directly. (rsync -av --rsh="ssh -TA userA@hostA ssh -TA -l userB" /tmp/foobar/ hostB:/tmp/foobar/).

8

If you are wondering how far along a slow-running rsync has gotten, and didn't use -v to list files as they are transferred, you can find out which files it has open:

 ls -l /proc/$(pidof rsync)/fd/*

on a system which has /proc

E.g. rsync was hung for me just now, even though the remote system seemed to have a bunch of space left. This trick helped me find the unexpectedly huge file which I didn't remember, which wouldn't fit on the other end.

It also told me a bit more interesting information - the other end apparently gave up, since there was also a broken socket link:

/proc/22954/fd/4: broken symbolic link to `socket:[2387837]'
nealmcb
  • 297
  • 3
  • 9
6

--archive is a standard choice (though not the default) for backup-like jobs, which makes sure most metadata from the source files (permissions, ownership, etc.) are copied across.

However, if you don't want to use that, oftentimes you'll still want to include --times, which will copy across the modification times of files. This makes the next rsync that runs (assuming you are doing it repeatedly) much faster, as rsync compares the modification times and skips the file if it's unchanged. Surprisingly (to me at least) this option is not the default.

Andrew Ferrier
  • 864
  • 9
  • 21
5

Mine is --inplace. Works wonders when the server for backups is running ZFS or btrfs and you make native snapshots.

Hubert Kario
  • 6,351
  • 6
  • 33
  • 65
4

The one I use the most is definitely --exclude-from which lets you specify a file containing things to be excluded.

I also find --chmod very useful because it lets you make sure that permissions end up in a desireable state even if your source is messed up.

innaM
  • 1,428
  • 9
  • 9
4

Of course, there's also --delete which removes stuff from the target that cannot be found in the source.

innaM
  • 1,428
  • 9
  • 9
4

--backup-dir=date +%Y.%m.%d --delete We are deleting but making a copy... just in case

2

cwrsync - Rsync for Windows http://www.itefix.no/i2/node/10650

This version includes OpenSSH so you can tranfer files over a secure channel.

jftuga
  • 5,572
  • 4
  • 39
  • 50
2
--partial 

In case of interruptions

--bwlimit=100

To limit bandwidth - good for copying down large files, directories

enkdr
  • 230
  • 2
  • 7
1

Read the manual

Read the list of short options https://linux.die.net/man/1/rsync to get a feeling of what is possible. It is really impressive.

Experiment with basic use cases

Get familiar with some basic uses. Use --dry-run (-n) to get feedback on what rsync is about to do.

rsync -avn . /target/di

Archives file attributes (-a), displays the progress (-v), does a dry-run (-n). The command uses the short form of --archive (-a) which translates to (-rlptgoD).

  • -r - recursive copy
  • -l - copy symlinks as symlinks
  • -p - set permissions to be the same as the source
  • -t - set mtime to be the same as the source. Use this to support fast incremental updates based on mtime.
  • -g - set group to be the same as the source
  • -o - set owner to be the same as the source
  • -D - if remote user is superuser this recreates devices and other special files

Selection of some cool options

Move

--remove-source-files This will remove copied files from source.

Update

--update This forces rsync to skip any files which exist on the destination and have a modified time that is newer than the source file.

Delete

--delete Delete files that does not exist in source tree.

Backup

--backup Make a backup of modified or removed files.

--backup-dir=date +%Y.%m.%d Specify a backup dir.

What to copy?

--min-size=1 Do not copy empty files.

--max-size=100K Copy only small files. Can be used to handle small and large files differently.

--existing Only overrides files that already exist on the target. Do not create new files on target.

--ignore-existing Only copy files that do not exist on target.

--exclude-from Define excludes in a file.

Scheduling, Bandwidth and Performance

--time-limit Ends rsync after a certain time limit.

--stop-at=y-m-dTh:m Ends rsync at a specific time.

--partial Allows partial copies in case of interruptions.

--bwlimit=100 Limits bandwidth Specify KBytes/second. Good option if transfer of large files is required.

Output

  • -h output numbers in a human-readable format.
  • --progress display progress.
  • -i log change info.
  • --log-file= define a log file.
  • --quiet no output.
jschnasse
  • 123
  • 5
1

Don't repeat yourself:

 --ignore-existing       skip updating files that already exist on receiver
miguelfg
  • 111
  • 3
1

If you have rsync set up as a daemon on the server, you can just browse the shared modules like any other directory listing. Then you can see which paths are available and what nots.

sybreon
  • 7,357
  • 1
  • 19
  • 19
1

when i using GlusterFs we have a bottleneck with T files with zero size, for sync between crashed brick or replica we must use --min-size=1 to not syncing empty file from crashed server