72

I currently have two CentOS servers. I need to know how and what the quickest way would be to "tar" up the images directory and SCP it over?

Is that the quickest way that I just suggested, because tarring is taking forever... I ran the command:

tar cvf imagesbackup.tar images

And I was going to just scp it over.

Let me know if there is a quicker way. I have remote/SSH access to both machines.

user1623521
  • 348
  • 1
  • 3
  • 13
Andrew Fashion
  • 1,635
  • 7
  • 22
  • 26

8 Answers8

106

Instead of using tar to write to your local disk, you can write directly to the remote server over the network using ssh.

server1$ tar -zc ./path | ssh server2 "cat > ~/file.tar.gz"

Any string that follows your "ssh" command will be run on the remote server instead of the interactive logon. You can pipe input/output to and from those remote commands through SSH as if they were local. Putting the command in quotes avoids any confusion, especially when using redirection.

Or, you can extract the tar file on the other server directly:

server1$ tar -zc ./path | ssh server2 "tar -zx -C /destination"

Note the seldom-used -C option. It means "change to this directory first before doing anything."

Or, perhaps you want to "pull" from the destination server:

server2$ tar -zx -C /destination < <(ssh server1 "tar -zc -C /srcdir ./path")

Note that the <(cmd) construct is new to bash and doesn't work on older systems. It runs a program and sends the output to a pipe, and substitutes that pipe into the command as if it was a file.

I could just have easily have written the above as follows:

server2$ tar -zx -C /destination -f <(ssh server1 "tar -zc -C /srcdir ./path")

Or as follows:

server2$ ssh server1 "tar -zc -C /srcdir ./path" | tar -zx -C /destination

Or, you can save yourself some grief and just use rsync:

server1$ rsync -az ./path server2:/destination/

Finally, remember that compressing the data before transfer will reduce your bandwidth, but on a very fast connection, it may actually make the operation take more time. This is because your computer may not be able to compress fast enough to keep up: if compressing 100MB takes longer than it would take to send 100MB, then it's faster to send it uncompressed.

Alternately, you may want to consider piping to gzip yourself (rather than using the -z option) so that you can specify a compression level. It's been my experience that on fast network connections with compressible data, using gzip at level 2 or 3 (the default is 6) gives the best overall throughput in most cases. Like so:

server1$ tar -c ./path | gzip -2 | ssh server2 "cat > ~/file.tar.gz"
tylerl
  • 14,885
  • 7
  • 49
  • 71
  • Rsync worked beautifully - compresses on the fly, copies whole folders, resumes on broken link. All in one simple command. Love it. These are the options I found useful: z: compress r: recurse = copy subfolder v: verbose. My Rsync command example : rsync -azvr /src-path/ username@dest_server:/dest/path/ – Bastion Aug 09 '17 at 00:32
  • rsync is not necessarily the right tool in this *particular* use case. It is inefficient for copying many small files (e.g. 55GB of images) in one go, although its ability to skip downloading already-transferred files can obviously override that disadvantage depending on your use case. – Chris L. Barnes Mar 27 '20 at 16:59
70

I'd be tempted to rsync it over myself - it does compression and handles link loss well.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
12

If you just tar them up and nothing else this will waste tons of time with only minimal speed gain.

So simply taring up the files with the cvf switches will effectively cost the time it takes to read all the 55GB images and write them back to disk. (Effectively it will be even more time wasted since there will be an considerable overhead).

There is only one advantage you gain here, the overhead for uploading many files is being reduced. You might get faster transfer times if you compress the images (but since I believe they are already in a compressed format this won't be much help). Just more waste of computing time.

The biggest disadvantage from transfering a huge tar archiv over wire is that if something goes wrong it could mean you have to start over.

I would use that way:

md5sum /images/* > md5sum.txt
scp -r images/* user@host:/images/

On the new server

md5sum /images/* > md5sum_new.txt

And then just diff. And since scp supports compression on the fly there is no need for separate archives.

Edit

I'll keep the MD5 information since it was useful to the OP. But one comment hit me with new insight. So a bit of searching provided this useful piece of information. Please note that the subject here is SFTP not directly SCP.

In contrast to FTP, SFTP does add overhead to the transfer of files. As a file is transferred between client and server, it is broken up into smaller chunks called "packets." For example, suppose each packet is 32KB. The SFTP protocol does a checksum on each 32KB file as it is sent, and includes that checksum along with that packet. The receiver gets that packet and decrypts the data, and then verifies the checksum. The checksum itself is "stronger" than the CRC32 checksum. (Because SFTP uses a 128-bit or higher checksum, such as MD5 or SHA, and because this is done on each and every packet, there is a very granular integrity checking that is accomplished as part of the transfer.) Thus, the protocol itself is slower (because of the additional overhead), but the successful completion of a transfer means, de facto, that it has be transferred integrally and there is no need for an additional check.

pacey
  • 3,833
  • 1
  • 15
  • 31
  • Thank you very much, what is the md5sum doing? and what is diff? Thank you, performing now! – Andrew Fashion Dec 02 '10 at 12:56
  • 2
    md5sum (or md5) takes a checksum of the files. Diff looks for differences in the files (man diff). The checksum creates a string, a hash, that if the file is changed in transit...a bit flipped, an error...won't match when you take it again on the other side. For large files you have an increased chance of errors. That's why when you see sites that let you download .iso files they often have an MD5 checksum for you to compare your downloaded file to to makes sure it matches and isn't corrupt. – Bart Silverstrim Dec 02 '10 at 13:01
  • 3
    scp is encrypted and guarantees integrity over the line. There is still a slight chance that the data was corrupt in memory or on disk of course, but that's pretty rare. – Ryan Bair Dec 02 '10 at 16:40
  • 1
    Does the overhead of SFTP checksums actually matter in any practical sense? I can't imagine so. 4 bytes for every 32768 doesn't sound significant. That's 128 kB per GB. Calling that "slower" seems like an overstatement in anything except a boring theoretical sense. – underscore_d Oct 27 '15 at 19:16
8

On top of Pacey's md5sum suggestion, I'd use the following:

On the destination: nc -w5 -l -p 4567 | tar -xvf -

Then on the source: tar -cvf - /path/to/source/ | nc -w5 destinationserver 4567

It's still a tar/untar, and there's no encryption, but it's direct to the other server. Start them both in tandem (-w5 gives you 5 seconds' grace.) and watch it go. If bandwidth is tight, add -z to the tar on both ends.

SmallClanger
  • 8,947
  • 1
  • 31
  • 45
  • 1
    I think it's the other way around first he has to execute on destination (to open the socket) and then on source (to dispatch) – Dimitrios Mistriotis Dec 02 '10 at 15:45
  • in place of destination server, do i just put root@1.1.1.1 ? – Andrew Fashion Dec 02 '10 at 16:12
  • Nope, just the IP. netcat isn't using a protocol other than TCP :) This command will also be the fastest of all the commands given above. There's exactly one read per file on the source, the exact minimum network traffic to transfer the files, and exactly one write per file on the destination. If you have spare CPU cycles, adding the -z flag (for compression) will speed it up further, as less network data has to be transferred. – Jeff McJunkin Dec 02 '10 at 17:10
  • @user36845 - True. I wasn't implying a chronology with the ordering above, but you're right, the socket will need to be opened first. I'll edit it to clarify. :) – SmallClanger Dec 02 '10 at 18:07
  • I'm unsure of why ssh/scp were capping out at 125MB/s to 133MB/s, but netcat can pipe that data at ~ 380MB/s easily (same link) – ThorSummoner May 17 '18 at 07:32
2

One point - not all hosts have rsync and may hosts may well have different versions of tar. For this reason, one could recommend as a first port of call using the oft-neglected cpio.

You can cpio over ssh to do ad-hoc replication of file/directory structures between hosts. This way you have finer control over what gets sent over seeing as you need to "feed" cpio, nom-nom. It's also more argument-portable, cpio doesn't change much - this is an important point if you are looking after multiple hosts in a heterogeneous environment.

Example copying /export/home and subdirs to remote host:

cd /export/ find . home -print | cpio -oaV | ssh 10.10.10.10 'cd /export/home; cpio -imVd'

The above would copy the contents of /export/home and any subdirs to /export/home on the remote host.

Hope this helps.

cachonfinga
  • 215
  • 1
  • 6
  • He did mention it was two CentOS boxes, so they'd have rsync and file compatible versions of tar. Tools like rsync were created to replace tools like cpio :). You can't "resume" with cpio, at least without knowing where exactly you want to start from and filter your find as appropriate. Which is an unnecessary time overhead. Having said that, useful information for 'old' UNIX boxes :) – Rafiq Maniar Dec 02 '10 at 15:02
  • Yes, that cmmand lost me haha – Andrew Fashion Dec 02 '10 at 16:12
1

I you have ssh access, you have rsync access.

rsync -av -e ssh /storage/images/ user@[ip or domain name]:/storage/images/

or

rsync -av -e "ssh -l user" /storage/images/ [ip or domain name]:/storage/images/

If you receive an error like "rsync error: some files could not be transferred (code 23) at main.c(977) [sender=2.6.9]", check your user and groups between the servers; you might have a mismatch.

Use the rsync "-z" option if you want rsync to compress the transfer. This option will use more CPU but less bandwidth, so be aware of that.

There is a "--progress" option which will give you a percent transferred, which is kind of nice if you like that sort of thing.

quinnr
  • 429
  • 1
  • 4
  • 8
0

Are they on a shared network instead of needing the internet to transfer files? NFS or FTP might be a lot faster than the overhead of SCP, although you would lose the encryption during the transfer.

Tex
  • 19
  • 3
0

Or you can always use tar pipes :

(cd /path && tar -cjf - * ) | ssh user@host 'tar -xjf - -C /path'

'j' = bzip2, you can use 'z' for gzip or --lzma if your tar supports it.

OneOfOne
  • 222
  • 4
  • 13