Moving 2TB (10 mil files + dirs), what's my bottleneck?

21

2

Background

I ran out of space on /home/data and need to transfer /home/data/repo to /home/data2.

/home/data/repo contains 1M dirs, each of which contain 11 dirs and 10 files. It totals 2TB.

/home/data is on ext3 with dir_index enabled. /home/data2 is on ext4. Running CentOS 6.4.

I assume these approaches are slow because of the fact that repo/ has 1 million dirs directly underneath it.


Attempt 1: mv is fast but gets interrupted

I could be done if this had finished:

/home/data> mv repo ../data2

But it was interrupted after 1.5TB was transferred. It was writing at about 1GB/min.

Attempt 2: rsync crawls after 8 hours of building file list

/home/data> rsync --ignore-existing -rv repo ../data2

It took several hours to build the 'incremental file list' and then it transfers at 100MB/min.

I cancel it to try a faster approach.

Attempt 3a: mv complains

Testing it on a subdirectory:

/home/data/repo> mv -f foobar ../../data2/repo/
mv: inter-device move failed: '(foobar)' to '../../data2/repo/foobar'; unable to remove target: Is a directory

I'm not sure what this is error about, but maybe cp can bail me out..

Attempt 3b: cp gets nowhere after 8 hours

/home/data> cp -nr repo ../data2

It reads the disk for 8 hours and I decide to cancel it and go back to rsync.

Attempt 4: rsync crawls after 8 hours of building file list

/home/data> rsync --ignore-existing --remove-source-files -rv repo ../data2

I used --remove-source-files thinking it might make it faster if I start cleanup now.

It takes at least 6 hours to build the file list then it transfers at 100-200MB/min.

But the server was burdened overnight and my connection closed.

Attempt 5: THERES ONLY 300GB LEFT TO MOVE WHY IS THIS SO PAINFUL

/home/data> rsync --ignore-existing --remove-source-files -rvW repo ../data2

Interrupted again. The -W almost seemed to make "sending incremental file list" faster, which to my understanding shouldn't make sense. Regardless, the transfer is horribly slow and I'm giving up on this one.

Attempt 6: tar

/home/data> nohup tar cf - . |(cd ../data2; tar xvfk -)

Basically attempting to re-copy everything but ignoring existing files. It has to wade thru 1.7TB of existing files but at least it's reading at 1.2GB/min.

So far, this is the only command which gives instant gratification.

Update: interrupted again, somehow, even with nohup..

Attempt 7: harakiri

Still debating this one

Attempt 8: scripted 'merge' with mv

The destination dir had about 120k empty dirs, so I ran

/home/data2/repo> find . -type d -empty -exec rmdir {} \;

Ruby script:

SRC  = "/home/data/repo"
DEST = "/home/data2/repo"

`ls #{SRC}  --color=never > lst1.tmp`
`ls #{DEST} --color=never > lst2.tmp`
`diff lst1.tmp lst2.tmp | grep '<' > /home/data/missing.tmp`

t = `cat /home/data/missing.tmp | wc -l`.to_i
puts "Todo: #{t}"

# Manually `mv` each missing directory
File.open('missing.tmp').each do |line|
  dir = line.strip.gsub('< ', '')
  puts `mv #{SRC}/#{dir} #{DEST}/`
end

DONE.

Tim

Posted 2013-09-06T15:48:04.727

Reputation: 343

You are correct,it has to find and enumerate each directory and 1 million dirs is going to be painful. – cybernard – 2013-09-06T16:19:03.213

2Look at the bright side... if it were Windows, you couldn't even have a million subdirectories and still have an OS that works. :) – Jack – 2013-09-06T16:55:00.620

@Jack really? Does Windows have a limit? Is this not a relic from the FAT32 days (I haven't used Windows as a main OS since ~2001 so I am not really up to date on it)? – terdon – 2013-09-06T17:27:45.737

1@Tim, why don't you just mv again? In theory mv will only delete a source file if the destination file has been completely copied so it should work OK. Also, do you have physical access to the machine or is this done through an ssh connection? – terdon – 2013-09-06T17:28:42.850

@terdon - Windows doesn't have a limit, per se... but it has a point where it becomes unusable for all intents and purposes. Windows Explorer will take forever to display the file list, etc. – Jack – 2013-09-06T18:25:39.810

@Jack OK, but that will only affect that one directory right? Or will the entire system be affected? – terdon – 2013-09-06T18:27:45.943

@terdon - Just the one directory. See http://technet.microsoft.com/en-us/magazine/hh395477.aspx

– Jack – 2013-09-06T18:43:05.680

@terdon - Wanted to use mv -f but tested it on a subdir and got mv: inter-device move failed: '(foobar)' to '../../data2/repo/foobar'; unable to remove target: Is a directory. And yes, I'm using ssh. – Tim – 2013-09-06T19:33:55.897

With that many files/directories you'd honestly be better off using dd (though for 2TB it'd take hours/days to finish) – justbrowsing – 2013-09-06T19:53:21.617

@justbrowsing - the problem now is that I need to merge/resume. Can dd do that? If some of the source files weren't deleted already, I'd just delete the destination dir and mv the source again. It would have taken only 24 hours had it not been interrupted. – Tim – 2013-09-06T20:20:59.687

5No it can't. mv isn't forgiving, if you keep getting disconnected you could lose data and not even know it. As you said you are doing this over ssh, I highly recommend using screen and detach. Enable logging and keep track that way. If you are using verbose it'll just take longer. Also try iotop – justbrowsing – 2013-09-06T20:34:55.103

2@justbrowsing - Good call on screen. I was wondering about verbose but I guess it's too late to restart tar right now. And iotop has been my favorite utility for the last few days :) – Tim – 2013-09-06T20:45:46.850

is one of your directories mounted from a server? then I would recommend using a direct link using rsync dir1 server:dir2 or rsync server:dir1 dir2 depending on the server that is less likely to get disconnected. nesting this command in a screen shell allows to avoid some disconnections. – meduz – 2013-09-10T09:56:01.633

Answers

6

Ever heard of splitting large tasks into smaller tasks?

/home/data/repo contains 1M dirs, each of which contain 11 dirs and 10 files. It totals 2TB.

rsync -a /source/1/ /destination/1/
rsync -a /source/2/ /destination/2/
rsync -a /source/3/ /destination/3/
rsync -a /source/4/ /destination/4/
rsync -a /source/5/ /destination/5/
rsync -a /source/6/ /destination/6/
rsync -a /source/7/ /destination/7/
rsync -a /source/8/ /destination/8/
rsync -a /source/9/ /destination/9/
rsync -a /source/10/ /destination/10/
rsync -a /source/11/ /destination/11/

(...)

Coffee break time.

Ярослав Рахматуллин

Posted 2013-09-06T15:48:04.727

Reputation: 9 076

1The benefit I'm vaguely emphasizing is that you track the progress in small parts manually so that resuming the task will take lesss time if some part is aborted (because you know which steps were completed successfully). – Ярослав Рахматуллин – 2013-09-18T01:08:11.810

This is basically what I ended up doing in the end, except with mv. Unfortunate there is no tool meeting mv and rsync halfway. – Tim – 2013-09-23T20:41:26.003

4

This is what is happening:

  • Initially rsync will build the list of files.
  • Building this list is really slow, due to an initial sorting of the file list.
  • This can be avoided by using ls -f -1 and combining it with xargs for building the set of files that rsync will use, or either redirecting output to a file with the file list.
  • Passing this list to rsync instead of the folder, will make rsync to start working immediately.
  • This trick of ls -f -1 over folders with millions of files is perfectly described in this article: http://unixetc.co.uk/2012/05/20/large-directory-causes-ls-to-hang/

maki

Posted 2013-09-06T15:48:04.727

Reputation: 141

1Can you give an example of how to use ls with rsync? I have a similar but not identical situation. On machine A I have rsyncd running and a large directory tree I want to transfer to machine B (actually, 90% of the directory is already at B). The problem is that I have to do this using a unstable mobile connection that frequently drops. Spending an hour on building the file list everytime I restart is pretty inefficient. Also, B is behind NAT that I don't control so it is hard to connect A -> B, while B -> A is easy. – d-b – 2015-02-04T09:49:25.413

Agree with @d-b. If an example could be given, that would make this answer much more useful. – redfox05 – 2019-04-08T15:13:14.670

1

Even if rsync is slow (why is it slow? maybe -z will help) it sounds like you've gotten a lot of it moved over, so you could just keep trying:

If you used --remove-source-files, you could then follow-up by removing empty directories. --remove-source-files will remove all the files, but will leave the directories there.

Just make sure you DO NOT use --remove-source-files with --delete to do multiple passes.

Also for increased speed you can use --inplace

If you're getting kicked out because you're trying to do this remotely on a server, go ahead and run this inside a 'screen' session. At least that way you can let it run.

Angelo

Posted 2013-09-06T15:48:04.727

Reputation: 111