Favorite rsync tips and tricks

Question

The more I use rsync the more I realise that it's a swiss army knife of file transfer. There are so many options. I recently found out that you can go --remove-source-files and it'll delete a file from the source when it's been copied, which makes it a bit more of a move, rather than copy programme. :)

What are you favorite little rsync tips and tricks?

score 22 · Accepted Answer · answered Jul 24 '09 at 16:29

22

Try to use rsync version 3 if you have to sync many files! V3 builds its file list incrementally and is much faster and uses less memory than version 2.

Depending on your platform this can make quite a difference. On OSX version 2.6.3 would take more than one hour or crash trying to build an index of 5 million files while the version 3.0.2 I compiled started copying right away.

answered Jul 24 '09 at 16:29

robcast

533
3
8

One thing to note there is that if you use some options (like `--delete-before` for instance) the old "build list first" behaviour is used as it is required for these options to work correctly - so if you don't see this behaviour check if the other options you are using are known to stop it being possible. This can be useful if you are using rsync interactively on a large tree and want to force the initial scan so the output of `--progress` is accurate (i.e. the "objects to compare" count will never rise as no new objects will be found after the initial scan). – David Spillett Aug 03 '12 at 14:33

David Spillett · Answer 2 · 2009-07-24T13:05:25.440

21

Using --link-dest to create space-efficient snapshot based backups, whereby you appear to have multiple complete copies of the backedup data (one for each backup run) but files that don't change between runs are hard-linked instead of creating new copies saving space.

(actually, I still use the rysnc-followed-by-cp -al method which achieves the same thing, see http://www.mikerubel.org/computers/rsync_snapshots/ for an oldish-but-still-very-good run down of both techniques and related issues)

The one major disadvantage of this technique is that if a file is corrupted due to disk error it is just as corrupt in all snapshots that link to that file, but I have offline backups too which would protect against this to a decent extent. The other thing to look out for is that your filesystem has enough inodes or you'll run out of them before you actually run out of disk space (though I've never had a problem with the ext2/3 defaults).

Also, never forget the very very useful --dry-run for a little healthy paranoia, especially when you are using the --delete* options.

edited Jul 24 '09 at 13:05

answered Jul 24 '09 at 12:26

David Spillett

22,534
42
66

1

Note that -n is the shortcut for --dry-run – ctennis Aug 23 '09 at 15:01
3

I prefer to stick with the long names, especially in scripts that others may end up maintaining. It makes it clearer what is intended without reference to the docs. – David Spillett Aug 24 '09 at 12:01
+1 I implemented a backup solution of many TB over many machines with the --link-dest method for hard-linked snapshots as described above - it worked perfectly. – matja Mar 06 '10 at 11:14
If you like --link-dest backups, check out [Dirvish](http://www.dirvish.org/) which uses rsync under the hood – hfs Aug 02 '12 at 08:13
Oh yikes, I'd never even considered that. After one year of daily link-dest, my backup drive is at 26% used inodes (ext4 with default options). – Izkata Oct 13 '17 at 04:25

score 16 · Answer 3 · answered Aug 26 '09 at 13:05

If you need to update a website with some huge files over a slowish link, you can transfer the small files this way:

rsync -a --max-size=100K /var/www/ there:/var/www/

then do this for the big files:

rsync -a --min-size=100K --bwlimit=100 /var/www/ there:/var/www/

rsync has lots of options that are handy for websites. Unfortunately, it does not have a built-in way of detecting simultaneous updates, so you have to add logic to cron scripts to avoid overlapping writes of huge files.

score 12 · Answer 4 · answered Jun 04 '12 at 23:51

12

--time-limit

When this option is used rsync will stop after T minutes and exit. I think this option is useful when rsyncing a large amount of data during the night (non-busy hours), and then stopping when it is time for people to start using the network, during the day (busy hours).

--stop-at=y-m-dTh:m

This option allows you to specify at what time to stop rsync.

Batch Mode

Batch mode can be used to apply the same set of updates to many identical systems.

answered Jun 04 '12 at 23:51

jftuga

5,572
4
39
50

Useful! I had been using the "at" command before to kill the process – Lionel Dec 22 '12 at 11:31
Source patches: http://rsync.samba.org/ftp/rsync/rsync-patches-3.1.0.tar.gz ; Win32 binary with patch included: https://www.itefix.no/i2/cwrsync – jftuga Nov 06 '13 at 15:04
3

Unfortunately these options are not available in rsync distributed with Redhat/Centos or Ubuntu distros. – IanB Mar 10 '15 at 22:58
@Lionel: How are you using `at` to kill the process? – IMTheNachoMan Sep 15 '16 at 01:59

score 12 · Answer 5 · answered Jul 24 '09 at 11:32

12

I use the --existing option when trying to keep a small subset of files from one directory synced to another location.

answered Jul 24 '09 at 11:32

TCampbell

2,014
14
14

Thanks! This just saved me from some nasty filter rule writing. – benzado Feb 04 '10 at 20:06

score 9 · Answer 6 · answered Jul 24 '09 at 17:26

--rsh is mine.

I've used it to change the cipher on ssh to something faster (--rsh="ssh -c arcfour") also to set up a chain of sshs (recommend using it with ssh-agent) to sync files between hosts that can not talk directly. (rsync -av --rsh="ssh -TA userA@hostA ssh -TA -l userB" /tmp/foobar/ hostB:/tmp/foobar/).

score 8 · Answer 7 · answered Jan 09 '11 at 19:08

If you are wondering how far along a slow-running rsync has gotten, and didn't use -v to list files as they are transferred, you can find out which files it has open:

 ls -l /proc/$(pidof rsync)/fd/*

on a system which has /proc

E.g. rsync was hung for me just now, even though the remote system seemed to have a bunch of space left. This trick helped me find the unexpectedly huge file which I didn't remember, which wouldn't fit on the other end.

It also told me a bit more interesting information - the other end apparently gave up, since there was also a broken socket link:

/proc/22954/fd/4: broken symbolic link to `socket:[2387837]'

score 6 · Answer 8 · answered Oct 10 '10 at 09:14

--archive is a standard choice (though not the default) for backup-like jobs, which makes sure most metadata from the source files (permissions, ownership, etc.) are copied across.

However, if you don't want to use that, oftentimes you'll still want to include --times, which will copy across the modification times of files. This makes the next rsync that runs (assuming you are doing it repeatedly) much faster, as rsync compares the modification times and skips the file if it's unchanged. Surprisingly (to me at least) this option is not the default.

score 5 · Answer 9 · answered Jan 09 '11 at 20:50

5

Mine is --inplace. Works wonders when the server for backups is running ZFS or btrfs and you make native snapshots.

answered Jan 09 '11 at 20:50

Hubert Kario

6,351
6
33
65

score 4 · Answer 10 · answered Jul 24 '09 at 11:28

The one I use the most is definitely --exclude-from which lets you specify a file containing things to be excluded.

I also find --chmod very useful because it lets you make sure that permissions end up in a desireable state even if your source is messed up.

score 4 · Answer 11 · answered Jul 24 '09 at 11:34

4

Of course, there's also --delete which removes stuff from the target that cannot be found in the source.

answered Jul 24 '09 at 11:34

innaM

1,428
9
9

score 4 · Answer 12 · answered Jul 24 '09 at 17:56

4

--backup-dir=date +%Y.%m.%d --delete We are deleting but making a copy... just in case

answered Jul 24 '09 at 17:56

score 2 · Answer 13 · answered Jan 09 '11 at 20:02

2

cwrsync - Rsync for Windows http://www.itefix.no/i2/node/10650

This version includes OpenSSH so you can tranfer files over a secure channel.

answered Jan 09 '11 at 20:02

jftuga

5,572
4
39
50

I use cwrsync, and it's great. So glad somebody brought this awesome capability to Windows. – Andrew Ensley Jul 15 '11 at 17:24

score 2 · Answer 14 · answered Jun 04 '12 at 23:19

2

--partial

In case of interruptions

--bwlimit=100

To limit bandwidth - good for copying down large files, directories

answered Jun 04 '12 at 23:19

enkdr

230
2
7

What is the unit of `bwlimit`? bits per sec, bytes per sec? – Timo Kähkönen Mar 10 '13 at 12:04
@Timo, bwlimit is in KBytes/second. – Andrew Ferrier Apr 10 '13 at 18:24

jschnasse · Answer 15 · 2020-04-06T13:46:10.020

Read the manual

Read the list of short options https://linux.die.net/man/1/rsync to get a feeling of what is possible. It is really impressive.

Experiment with basic use cases

Get familiar with some basic uses. Use --dry-run (-n) to get feedback on what rsync is about to do.

rsync -avn . /target/di

Archives file attributes (-a), displays the progress (-v), does a dry-run (-n). The command uses the short form of --archive (-a) which translates to (-rlptgoD).

-r - recursive copy
-l - copy symlinks as symlinks
-p - set permissions to be the same as the source
-t - set mtime to be the same as the source. Use this to support fast incremental updates based on mtime.
-g - set group to be the same as the source
-o - set owner to be the same as the source
-D - if remote user is superuser this recreates devices and other special files

Selection of some cool options

Move

--remove-source-files This will remove copied files from source.

Update

--update This forces rsync to skip any files which exist on the destination and have a modified time that is newer than the source file.

Delete

--delete Delete files that does not exist in source tree.

Backup

--backup Make a backup of modified or removed files.

--backup-dir=date +%Y.%m.%d Specify a backup dir.

What to copy?

--min-size=1 Do not copy empty files.

--max-size=100K Copy only small files. Can be used to handle small and large files differently.

--existing Only overrides files that already exist on the target. Do not create new files on target.

--ignore-existing Only copy files that do not exist on target.

--exclude-from Define excludes in a file.

Scheduling, Bandwidth and Performance

--time-limit Ends rsync after a certain time limit.

--stop-at=y-m-dTh:m Ends rsync at a specific time.

--partial Allows partial copies in case of interruptions.

--bwlimit=100 Limits bandwidth Specify KBytes/second. Good option if transfer of large files is required.

Output

-h output numbers in a human-readable format.
--progress display progress.
-i log change info.
--log-file= define a log file.
--quiet no output.

score 1 · Answer 16 · answered Feb 02 '21 at 21:07

1

Don't repeat yourself:

 --ignore-existing       skip updating files that already exist on receiver

answered Feb 02 '21 at 21:07

miguelfg

111
3

score 1 · Answer 17 · answered Jul 24 '09 at 13:09

1

If you have rsync set up as a daemon on the server, you can just browse the shared modules like any other directory listing. Then you can see which paths are available and what nots.

answered Jul 24 '09 at 13:09

sybreon

7,357
1
19
19

score 1 · Answer 18 · answered Apr 18 '15 at 04:35

1

when i using GlusterFs we have a bottleneck with T files with zero size, for sync between crashed brick or replica we must use --min-size=1 to not syncing empty file from crashed server

answered Apr 18 '15 at 04:35

Vahid Chakoshy

111
3