Gzip huge directory into separate .gz files for ssh transfer

This is appears to be a good match for rsync. It will transparently compress the contents, and it can be told to limit the bandwidth usage, which serves both to avoid clogging the network, and to prevent high IO load on the originating server:

rsync -az --bwlimit=1m directory server:/destination/

-a tells rsync to copy the file metadata such as creation times, -z means use compression, and --bwlimit limits the bandwidth used over the network.

As an additional bonus when rsync is used, if you interrupt the operation for any reason, and re-run it again, it will automatically pick up where it left off. If you also need to delete extra files at the destination, add the --delete option.

user4815162342

Posted 2016-12-03T07:57:21.553

Reputation: 293

This is a good suggestion, but what if you don't have rsync installed on the destination server? – Alessandro Dotti Contra – 2016-12-03T13:25:37.817

1@adc rsync is normally installed on Linux servers. If you somehow stumble on one that doesn't have it, I would suggest combining tar czf - directory | ssh remote 'cd destination && tar xf -'. If that runs too fast and causes high IO load on the origin server, add throttle -m 1 between the first tar and ssh. (You'll need to install the throttle utility, but only on the client.) – user4815162342 – 2016-12-03T13:32:33.127

I agree rsync is part of nearly all default Linux server installations, but you never know for sure beforehand, as some system administrators like to remove everything not stricly needed. Just for the sake of discussion, because we're drifting away from the original question. – Alessandro Dotti Contra – 2016-12-03T15:13:33.533

@adc True enough. Without rsync at my disposal, I'd go with the tar based solution. If you want, I can post that as a separate answer. – user4815162342 – 2016-12-03T15:46:52.070

You can edit and expand your answer if you like; I second both your solutions. – Alessandro Dotti Contra – 2016-12-03T15:51:08.177

Looks good! Makes sense to use this approach instead of gzipping. However, I've tried running this and so far it's just stalled at the console. Do you know what is a reasonable time for it to initialize and start the synchronization? – pir – 2016-12-04T07:09:22.300

1@pir 200k is quite a lot of files; if unsure, add the -v to see what rsync is doing. – user4815162342 – 2016-12-04T08:17:34.413

Gzip huge directory into separate .gz files for ssh transfer

Answers