Optimal Linux copy command for large number of files

3

This is a topic I have not been able to find a definitive answer on, or at least one with a good explanation on why one solution is better than the other. Lets say I have two local drives, one with files to be copied, one empty. Feedback is not necessary but optimal performance is, with a few caveats.

  1. The file structure from one point down must be consistent. For example, the files may be stored in the directory x where x is located at /my_drive_a/to_copy/files/x/ – however when I copy it to /my_drive_b/, I would like it to be structured only from /files/ down. So the result may look somewhat like /my_drive_b/files/x/.
  2. The files transferring will not be the same each time, thus a function like rsync may not be advantageous over a function like cp.
  3. The file count will be in the thousands, although all of them are small.
  4. The data must be copied, and retained on my_drive_a.

My initial thought would be simply doing cp -R /my_drive_a/to_copy/files/x/ /my_drive_b/files/x/. Again, with limited experience in copy functions within Linux, I am not sure if this is an optimal solution to copying such a large number of files.

sudosnake

Posted 2017-02-15T14:07:38.713

Reputation: 31

3I would just go with rsync – Arkadiusz Drabczyk – 2017-02-15T14:10:32.677

@ArkadiuszDrabczyk Thanks for the feedback, why would you choose rsync? – sudosnake – 2017-02-15T14:11:30.687

1>

  • I have bad experience with scp for copying a lot of data - I tried it once and it crashed. 2. if the connection was stopped rsync will not copy everything from the beginning but only files that haven't been copied yet 3. rsync works both locally and over ssh so you can use a single tool with the same options
  • < – Arkadiusz Drabczyk – 2017-02-15T14:38:32.457

    "I am not sure if this is an optimal solution to copying such a large number of files." I think the "optimal" results, for max speed, depends on some factors. For instance, Reiserfs was known to support lots of little files quite well. So you may get different results depending on what file system (or OS) you use. Your best bet could be: stop trying to transfer lots of little files, but to place them into 1 archive file, probably tar is most widely compatible and supportive of Unix meta-data, and then transfer one file. Using Unix piping may be slick, though bothersome if problems occur.) – TOOGAM – 2017-02-15T14:53:00.260

    Answers

    1

    Just go with cp. coreutils are well optimized and will perform excellent. Except from --archive flag, consider using --sparse=never, if you predict there are no sparse files. This will dumb down cp and save time.

    Why not rsync? It will try to analyze the files, sort them (see "SORTED TRANSFER ORDER" in man rsync), and its very hard to print useful progress information without seriously hindering the whole process. While some of its options can be turned off, some are obligatory and will result in slower execution time.

    Depending on the size of your data it may be faster to copy whole disk (ex. /dev/sda) with programs like dd or ddrescue, but it's hard to tell when this option will be faster.

    styrofoam fly

    Posted 2017-02-15T14:07:38.713

    Reputation: 1 746