3

I'm launching a rsync simple command between two servers. Both servers have two eth interfaces on bonding. When I send a big file from one server to the other with rsync I reach 130M/s transfer rate.

But, and here is the problem, when I send a directory with lots of small files the transfer is 1M/s at its best.

I've checked both cpu loads(8cpu i7), and they are at 10% maximum.

Knowing that what makes all the transfer slow down is the open/close of the files, and this 'theoretically' goes on the cpu, I understand that this can be easily tuned. But I do not know how to tune that.

Any tip on how to make rsync use all CPUs?

Zoredache
  • 128,755
  • 40
  • 271
  • 413
Marc Riera
  • 1,587
  • 4
  • 21
  • 38

4 Answers4

25

Your problem doesn't have (almost) anything to do with the CPU.

Transferring big files is usually fast, since it can be done with sequential I/O.

Transferring lots of small files requires tons of horsepower on the storage side of things, since it requires random I/O. Low seek times, fast hard drives, lots of cache and a filesystem designed for huge number of files are a must. CPU does not help there, at least not much, just like you are observing. CPU's and OS are just waiting for disk I/O to finish.

All that faster CPU / more cores can do, that they can end up waiting for I/O faster. :-)

Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78
  • 1
    +1. All files need to transfer. be created on the file system etc. THat is a lot of IO. Unless you have a ton of IOPS budget (MANY FAST DISCS) that will just overload the discs. – TomTom Oct 29 '10 at 11:16
  • Well put. +1 to you – gWaldo Oct 29 '10 at 12:32
  • Good explanation. I was just hopping to be wrong. :( This means I will have to go for a different kind of filesystem and I have no time for it, I'll have to take time from my sleeping time. – Marc Riera Oct 29 '10 at 15:02
3

The latency of many many small random IO operation adds up:

  • access and seek times of file system and hard disks
  • comparison times of rsync

In my experience is rsync a very good tool to hold things in sync, but not a very good tool to submit all data as fast as possible. Use it when bandwidth or storage capacity don't leave other options. If you can afford to tar all files up and transfer in one blob, you can expect increased performance (overall wall clock time used to complete to operation), if there are enough files.

knitti
  • 700
  • 6
  • 9
0

There is a lot of network/disk overhead when dealing with lots of small files using rsync. With small enough files, your speedup factor may be less than 1.

Pay attention to the speedup factor using -v. If your speedup factor is below 1 even when you know you're already in sync, then you are experiencing quite a lot of overhead. The CPU is not the bottleneck.

Cakemox
  • 24,141
  • 6
  • 41
  • 67
0

What Janne said: you're IO bound, not CPU bound. Launch top (or better, atop/htop), notice how little CPU is actually used when transferring small files. Also note that your processes are in 'D' state, waiting for data to be available to them.

Additionally, I don't believe rsync is optimized for multi-core; most of what it does is sequential, and it would require very clever work to make it go faster in that respect.

It does, however, probably take advantage of up to 2 cores if you use ssh as a transport. It will be spawned as a separate process, and will do all its encryption and possibly compression work in a separate thread from the main rsync process. Said process has somewhat CPU-intensive tasks: CRC calculation and MD5 hashing (I believe that's what it uses).

niXar
  • 2,023
  • 17
  • 23