20

I'm trying to transfer thousands of small files from one server to another using the following command:

rsync -zr --delete /home/user/ user@10.1.1.1::backup

Currently the transfer takes a long time (I haven't timed it). Is there way to make this faster? Should I be using another tool? Should I be using rsync over ssh rather than using the rsync protocol?

Wesley
  • 32,320
  • 9
  • 80
  • 116
Noodles
  • 1,356
  • 3
  • 18
  • 29

5 Answers5

15

You need to determine the bottleneck. It isn't rsync. It probably isn't your network bandwidth. As @Zoredache suggested it is most likely the huge number of iops generated by all the stat() calls. Any syncing tool is going to need to stat the files. While syncing run iostat to verify.

So the question becomes; how to I optimize stat? Two easy answers:

  1. get a faster disk subsystem (on both hosts if need be) and
  2. tune your filesystem (e.g. for ext3 mount with noatime and add a dir_index).

If by some chance it isn't your disk iops that is the limit then you could experiment with splitting the dir tree into multiple distinct trees and run multiple rsyncs.

wittich
  • 147
  • 1
  • 10
Mark Wagner
  • 17,764
  • 2
  • 30
  • 47
  • 1
    Thanks, I'll look into dir_index and see how I get on (we already use noatime). It seems like the disk io is the bottleneck, but we're already running 15k SAS drives in RAID 5. The next step would be SSD, but our hosting company doesn't give us that option yet. – Noodles Mar 01 '12 at 22:42
7

Compression is not very useful for small files (say, less than 100 bytes). For small files, sometimes the compressed version can be even bigger than the original. Try the rsync command without the -z flag.

ssh is good for security, but will not make the transfer faster. In fact, it would make the transfer slower due to the need for encryption/decryption.

rsync may not seem fast the first time it is run because there is a lot of data to transfer. However, if you plan on running this command periodically, subsequent runs may be much faster since rsync is smart about not transferring files that have not changed.

unutbu
  • 171
  • 5
  • If you just use the `rsync` client, it will use SSH behind the scenes. You have to go out of your way to disable encryption when using rsync. See: https://stackoverflow.com/a/1821574/64911 – mlissner Jan 02 '18 at 20:24
4

In case ext3 or ext4 filesystems are involved, check, that both have the dir_index feature enabled! This tripled rsync-throughput in my case.

See details in my answer at: https://serverfault.com/a/759421/80414

alfonx
  • 230
  • 3
  • 8
2

What version of rsync are you using? Anything older then 3.0.0 (on both ends) doesn't have the incremental filelist feature, which speeds up large transfers.

devicenull
  • 5,572
  • 1
  • 25
  • 31
1

Add -v --progress to your rsync command line

rsync is done in 2 steps:

  1. deep browse all files on both platforms to compare their size and mdate
  2. do the actual transfer

If you are rsync thousands of small files in nested directories, it can simply be that rsync spends most of this time going into subdirs and finding all files

If time is not spend for browsing, the time might simply due to the addition of all the latencies starting each new file transfer.

Alex F
  • 819
  • 1
  • 10
  • 17