Why is tar|tar so much faster than cp?

10

1

For recursively copying a directory, using tar to pack up a directory and then piping the output to another tar to unpack seems to be much faster than using cp -r (or cp -a).

Why is this? And why can't cp be made faster by doing it the same way under the hood?

Edit: I noticed this difference when trying to copy a huge directory structure containing tens of thousands of files and folders, deeply nested, but totalling only about 50MB. Not sure if that's relevant.

callum

Posted 2014-07-26T18:34:34.880

Reputation: 1 013

That's one interesting question. You can find some answers here: http://stackoverflow.com/questions/316078/ and here: http://unix.stackexchange.com/questions/66647/

– Teresa e Junior – 2014-07-26T19:00:44.510

Answers

6

Cp does open-read-close-open-write-close in a loop over all files. So reading from one place and writing to another occur fully interleaved. Tar|tar does reading and writing in separate processes, and in addition tar uses multiple threads to read (and write) several files 'at once', effectively allowing the disk controller to fetch, buffer and store many blocks of data at once. All in all, tar allows each component to work efficiently, while cp breaks down the problem in disparate, inefficiently small chunks.

Pum Walters

Posted 2014-07-26T18:34:34.880

Reputation: 81

Can we really say that's true of all cp implementations? How do we know that's true? And why would cp be written in such an inefficient way? Any textbook implementation of a file copy reads a buffer of n bytes at a time, and writes them to disk before reading another n bytes. But you're saying cp always reads the whole file before writing the whole copy? – LarsH – 2017-03-13T02:54:52.600