Copying large number of files from one directory to another in Linux

14

4

I have a directory containing around 280,000 files. I want to move them to another directory.

If I use cp or mv then I get an error 'argument list too long'.

If I write a script like

for file in ls *; do
   cp {source} to {destination} 
done

then, because of the ls command, its performance degrades.

How can I do this?

Ritesh Sharma

Posted 2010-02-10T14:10:26.597

Reputation:

1What is the total size of all files? Maybe first tar these files? – None – 2010-02-10T14:16:02.843

See this question.

– Nick Presta – 2010-02-10T14:20:52.670

Answers

20

Use rsync:

$ rsync -a {source}/ {destination}/

e.g.

$ rsync -a /some/path/to/src/ /other/path/to/dest/

(note the trailing /s)


Note: if it's a lengthy operation and you want to see some indication of progress during copying you can either add the -v (verbose) option, which then lists every file being copied, or consider using the --progress option, for more succinct progress output.

Paul R

Posted 2010-02-10T14:10:26.597

Reputation: 4 717

bash: /usr/bin/rsync: Argument list too long

Sorry Paul!!!!! – None – 2010-02-10T15:13:13.770

@Ritesh -- I'm guessing you specified some files or * as part of {source} - it should just be a directory, e.g. rsync -a /some/path/src/ /other/path/to/ -- note the trailing /s. – None – 2010-02-10T15:38:03.120

Yes Paul. I gave the path of the directory. but it didn't work! – None – 2010-02-11T04:08:28.463

@Ritesh - that doesn't seem possible - can you copy and paste the actual rsync command and resulting error message(s) from your terminal ? – Paul R – 2010-02-12T08:41:10.220

rsync will sometimes report "Argument list too long" when the actual problem is too little free space on the destination drive. – jk7 – 2018-08-17T23:51:05.773

10

I am missing two twings in the answers here, so I am adding yet another one.

Though this reminds me of adding yet another standard answer...

enter image description here

There are two problems here:

I have a directory containing around 280,000 files.

Most tools do not scale all that well with this number of files. Not just most Linux tools or windows tools, but quite a lot of programs. And that might include your filesystem. The long term solution would be 'well, do not do that then'. If you have different files, but them in different directories. If not expect to keep running into problems in the future.

Having said that, lets move to your actual problem:

If I use cp or mv then I get an error 'argument list too long'

This is caused by expansion of * by the shell. The shell has limited space for the result and it runs out. This means any command with an * expanded by the shell will run into the same problem. You will either need to expand fewer options at the same time, or use a different command.

One alternate command used often when you run into this problem is find. There are already several answers showing how to use it, so I am not going to repeat all that. I am however going to point out the difference between \; and +, since this can make a huge performance difference and hook nicely into the previous expansion explanation.

find /path/to/search --name "*.txt" -exec command {} \;

Will find all files under path/to/search/ and exec a command with it, but notice the quotes around the *. That feeds the * to the command. If we did not encapsulate it or escape it then the shell would try to expand it and we would get the same error.

Lastly, I want to mention something about {}. These brackets get replaced by the content found by find. If you end the command with a semicolom ; (one which you need to escape from the shell, hence the \;'s in the examples) then the results are passed one by one. This means that you will execute 280000 mv commands. One for each file. This might be slow.

Alternatively you can end with +. This will pass as many arguments as possible at the same time. If bash can handle 2000 arguments, then find /path -name "*filetype" -exec some_move {}+ will call the some_move command about 140 times, each time with 2000 arguments. That is more efficient (read: faster).

Hennes

Posted 2010-02-10T14:10:26.597

Reputation: 60 739

Note that the suffix {} \+ is only valid when the curly braces are the last argument to the command. This does not work "by default" for commands like mv and cp, where we would really like to replace the first argument. These two functions come with a -t argument that allows you to specify the destination before the source. Note, the other options: -type f helps to omit the directory target, -maxdepth 1 avoids finding the files after they copy. Thus, find <source> -maxdepth 1 -name "<regex search>" -type f -exec cp -t <target> {} \+. Source: stackoverflow.com/a/5241733/1524650 – John Haberstroh – 2019-12-04T22:46:49.143

1

You don't need the ls, you can simply use

for file in *; do
    cp $file /your/dest
done

or you can do something like:

echo * | xargs -i cp {} /your/dest

wich

Posted 2010-02-10T14:10:26.597

Reputation: 140

The first solution won't work coz of performance issue but I should try the second one. I'll but after some time. Thanks. – None – 2010-02-10T14:24:44.040

The first solution didn't work for me either. This one is the only one that worked. – Whitecat – 2015-06-19T20:45:42.940

The first solution lacks proper quoting, but other than that, it should work, and be better than the second. Proper quoting means double quotes around "$file" inside the loop. – tripleee – 2016-04-25T14:10:44.010

1

How about when moving (instead of copying):

$ find {origin}/ -maxdepth 1 -name "*" -o -name ".*" -exec mv '{}'  {destination}/ ';'

I think that will move keeping the structure (subdirs) and hidden files or dirs, plus no extra space consumed as with rsync + rm. And if {origin} and {destination} are in the same partition it will be faster.

jaimealsilva

Posted 2010-02-10T14:10:26.597

Reputation: 19

0

#!/bin/bash
d=$(date +%Y%m%d%H%m%s)
cd /path
tar zcvf "/destination/bakup_${d}.tar.gz" mydirectory_for_transer

user31894

Posted 2010-02-10T14:10:26.597

Reputation: 2 245

I think I should go for this. But one question still ticks in my mind and that is performance? – None – 2010-02-10T14:29:54.903

1i do not have a million files to test, so i can't answer for you about performance. you have to test out yourself on a development server. – user31894 – 2010-02-11T03:21:14.047

0

I like rsync for this, or:

find dir1 -type f -exec cp {} dir2 \;

Scott Carpenter

Posted 2010-02-10T14:10:26.597

Reputation:

0

In my case, both cp and rsync were way too slow for copying about 4 million files from a HDD to SSD, so here's how I went about it (all my files were .txt files in the same folder, so adjust your find to suit you):

cd /path/to/source/folder
find . -name '*.txt' -print >/tmp/test.manifest
tar -c -T /tmp/test.manifest | (cd /path/to/destination/folder; tar xfp -)

I had to print the filenames to a temporary file because I hit the Argument list too long error. Using tar significantly improved my transfer speeds, although I can assume that files that are less easily compressed may not perform as well.

Kelly

Posted 2010-02-10T14:10:26.597

Reputation: 111

0

Assuming you want to move the files within the same filesystem, you could just rename the directory containing your lacs and be done with it.

Tobu

Posted 2010-02-10T14:10:26.597

Reputation: 2 584

0

Using tar:

(cd {origin}; tar cf - .)|(cd {destination}; tar xvf -)

Works to get things started when the origin is initially too big for rsync but the deltas are not.

James McGill

Posted 2010-02-10T14:10:26.597

Reputation: 1