7

I encountered a situation where an app server misconfig led to creation of around 5TB datasets where each dir contains huge number of small files. We are in the process of transferring the files and change the application but the rsync fails on transferring the data. It is fails even locally between the local drives. I managed to copy only 3.5G overnight! I tried to change the rsync switches and still no luck. Here is what currently running on the server without any progress indication: rsync -avhWc --no-compress --progress source destination Some suggested the gigasync but the github and the site is unavailable. Can anybody suggest a method to transfer the files? Appreciate anyhelp

h.safe
  • 131
  • 1
  • 7
  • 2
    Why there's no exact error message in your question? – Michal Sokolowski May 29 '18 at 07:30
  • 5
    rather than copying the individual files, which is always slow with a huge amount of overhead with large amounts of very small files, you could copy the whole block device, something along the lines of `dd`, `netcat`, possible some compression and//or ssh https://serverfault.com/q/51567/37681 – HBruijn May 29 '18 at 09:19

4 Answers4

4

Try xargs+rsync:

 find . -type f -print0 | xargs -J % -0 rsync -aP % user@host:some/dir/

You can control how many files to pass as source to each call of rsync with -n E.g. to copy 200 files at every rsync:

 find . -type f -print0 | xargs -n 200 -J % -0 rsync -aP % user@host:some/dir/

If it's too slow you can run multiple copies of rsync in parallel with the -P option:

find . -type f -print0 | xargs -P 8 -n 200 -J % -0 rsync -aP % user@host:some/dir/

This will start 8 copies of rsync in parallel.

Luca Gibelli
  • 2,611
  • 1
  • 21
  • 29
  • Eventually that is where I landed basically piping the find into the rsync a bit different from yours however the issue is the time it consumes to transfer ...it is way too slow and server side 0 load... Here is what I used for the pipe: #find /local/data/path/ -mindepth 1 -ctime -0 -print0 | xargs -0 -n 1 -I {} -- rsync -a {} remote.host:/remote/data/path/. – h.safe May 29 '18 at 12:11
  • added an example of how to parallelize rsync to make the copy faster – Luca Gibelli May 29 '18 at 12:30
  • uh... what am I missing? `-P same as --partial --progress`. What does that have to do with parallelizing? – Michael Aug 02 '20 at 04:28
  • the parallelization is done by xargs -P 8, the -P in rsync is useful but not required – Luca Gibelli Aug 02 '20 at 19:31
2

If this is a trusted/secure network, and you can open a port on the target host, a good way to reproduce a tree on another machine is the combination of tar and netcat. I'm not at a terminal so cant write a full demonstration but this page does a pretty good job:

http://toast.djw.org.uk/tarpipe.html

Definitely use compression. In the best case you can transfer the data at the throughput rate the slowest of the three potential bottlenecks- read on the source, network, write on the target- permits.

Jonah Benton
  • 1,242
  • 7
  • 13
0

not specifying server OS - have you considered robocopy ? Its Windows based though. Supports threading, and retries, and bandwidth limitation. UNC to UNC capable. RoboCopy docs

quick google of rsync shows unix / windows... maybe you are using windows.

Alocyte
  • 121
  • 5
0

If you have ZFS, you can use ZFS-level replication to send the filesystem to a new destination.

If that is not an option, consider UDR+rsync, detailed here: Transfer large amount of small files

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Are you talking about the snapshotting to remote storage via SSH? if so that is terribly slow...and only a means of providing disaster-recovery – h.safe May 30 '18 at 14:41