Is there a way to tell if a file is done copying?

4

2

The scenario is this: Machine A has files I want to copy to Machine C. Machine A can't access C directly, but can access Machine B that can access Machine C. I am using scp to copy from Machine A to B, and then from B to C.

Machine B has limited storage space, so as files come in, I need to copy them to C and delete them from B. The second copy is much faster, so this is no problem with bandwidth.

I could do this by hand, but I am lazy. What I would like is to run a script on B or C that will copy each file to C as each one finishes. The scp job is running from A.

So what I need is a way to ask (preferably from a bash script) if file X.avi is "done" copying. Each of these files is a different size, and I can't really predict size or time of completion.

Edit: by the way, the file transfer times are something about 1 hour from A to B and about 10 minutes from B to C, if time scale matters at all.

Mike Cooper

Posted 2009-09-22T17:37:39.173

Reputation: 2 036

Answers

7

A common way of doing this is to first copy it to a temporary file name, preferably a hidden file. When the copy finishes, the script that is doing the copy then renames it to the non-hidden filename.

The script on machine B could then watch for non-hidden files.

The script on machine A would look something like this:

for file in `ls *` ; do
    scp $file user@host:~/.${file}.tmp
    ssh user@host "mv ~/.${file}.tmp $file"
done

Although this does not satisfy OP's desire to use the one-line

scp * user@host:~/

it accomplishes the same thing and also allows machine B to transfer each file as it finishes without waiting for the next file.

bmb

Posted 2009-09-22T17:37:39.173

Reputation: 487

The problem with this is that on machine A I want to do scp * user@host:~/ and the files being copied would more than fill machine B, so I can't move/rename files after they are copied from A. – Mike Cooper – 2009-09-22T18:25:54.803

Ah, so if I understand correctly, you're coping more than one file at once? That's where the problem comes from? – Josh – 2009-09-22T19:01:26.197

Yes, that is the problem. I can't store all the files on B, but this copy will take long enough that I don't want to sit and babysit it by watching for each file to finish and then copying it. Maybe I will go with your expanded version. – Mike Cooper – 2009-09-22T19:08:22.470

2

Does lsof on machine B show that scp has the file open? if so, you could watch lsof and see when scp closes the file. If not, you could watch the size of the file and after it hasn't changed for a given period of time (5 minutes, for example), copy it from B to C.

A third option would be to copy the files from A to to an "in_progress" directory on C. After the copy finishes on A, execute a mv command to move out of the "in_progress" directory.

Josh

Posted 2009-09-22T17:37:39.173

Reputation: 7 540

Sadly lsof doesn't seem to exist on machine B. About your third option, that won't work because if I let the copy finish before I do anything, I will vastly overwhelm my allowed space on B (something like 10:1). So whatever solution I have has to work during the copy. – Mike Cooper – 2009-09-22T18:24:51.280

The wait 5 minutes idea is a good one (I thought of something similar) but I'm not quite sure how it would be done. Any ideas? – Mike Cooper – 2009-09-22T18:27:25.050

Yeah, I could write a quick and dirty ruby script for you -- does machine B have ruby? – Josh – 2009-09-22T19:00:24.000

Nope. It isn't under my control and doesn't seem to have any of the things I try and use. I was hoping bash might have something built in to do this. Though I might be able to run the script from my local machine... I will experiment with that. – Mike Cooper – 2009-09-22T19:07:00.163

I'll see if I can code something like that in bash alone. My ruby prowess exceeds my bash prowess however. stackoverflow.com might be able to assist with such a script. – Josh – 2009-09-22T19:13:07.047

Actually I think I have figured it out, see my answer. – Mike Cooper – 2009-09-22T20:26:06.107

2

I just thought of another, completely unrelated option. Doesn't use scp at all. Please let me know if this would work:

  1. on B, create a fifo pipe somewhere: mkfifo /tmp/xfer

  2. on A, don't use scp, instead, tar -cz files | ssh B 'cat > /tmp/xfer

  3. on C, run ssh B 'cat /tmp/xfer' | tar -xz

This way, data isn't stored on B, it just passes through the pipe. The downside to this is, you can only have one copy going at a time...

You'll need to make sure the process on C respawns each time it finishes.

Josh

Posted 2009-09-22T17:37:39.173

Reputation: 7 540

The way you explain this it would seem to work... but it seems a bit "dark magic" for my liking. I think found another solution that works the original way, but I will definitely keep this snippet for later use. – Mike Cooper – 2009-09-22T19:32:21.563

This way isn't really "dark magic", it's just using pipes which is a standard OS facility. But I do like your other solution better, it's less code and probably easier to maintain. – Josh – 2009-09-22T19:55:08.150

Oh, not I just mean to me it is magic-y because I have never played with fifo pipes and it is something that isn't used quite as often as other things. Though something to learn now. Also, maintenance isn't much of an issue. This will be set up for about a day is all. – Mike Cooper – 2009-09-23T07:18:44.827

2

After thinking about the answers posted (in particular @Josh's idea of watching modified times) I was trying to run manipulate B's files on C. See, B is anaemic as far as available tools, so nothing that seemed to be able to do the job was there. I came upon this solution. This idea is not mine, I found it in google searches before this question. I discarded it earlier, since machine B did not have the find utility.

First, mount the appropriate directory on B onto C, so it appears as a local file system. I used sshfs for this (awesome tool, by the way). This will let me use C's utilities instead of B's.

Secondly, the command find /the/folder/* -mmin +5 will match all files modified over 5 minutes ago. So the command find /the/folder/* -mmin +5 -exec {} /the/other/folder \; will move all files that have been modified over 5 minutes ago to the other folder (which is actually on C, instead of sshfs mounted from B.

Finally, I set up a cron script to run the above script every 10 minutes today and tomorrow. The line in my crontab looks like this.

*/5 * 22,23 9 * find /the/folder/* -mmin +5 -exec mv {} /the/other/folder \;

Hopefully this will work. The next file has yet to complete, so I can't comment on if it really works when combined with the cron script, but I made some files by hand and seeded them and they moved fine. cross my fingers

Edit: This is working, though how it was originally had some errors, those are corrected now.

Mike Cooper

Posted 2009-09-22T17:37:39.173

Reputation: 2 036

That sounds like an awesome solution – Josh – 2009-09-22T19:54:10.257

1

No need for mkfifo. On machine B, run this:

ssh A 'tar -cz files' | ssh C 'tar -xz'

You might find tar's -C option useful.

If you need to initiate the copying on machine A, just do:

tar -cz files' | ssh B "ssh C 'tar -xz'"

Watch out for proper quoting, though.

tuomassalo

Posted 2009-09-22T17:37:39.173

Reputation: 423

0

The copy will either run as a another process, or you could force it to, using a subshell. Then, you could use ps to "watch" the process and see when it disappears.

Also, I believe that in *nix, you can delete the file while it's being copied. The system won't delete it until the copy program closes it. Of course, if the copy doesn't succeed, you lose the file, so, not the best idea.

gbarry

Posted 2009-09-22T17:37:39.173

Reputation: 694

I don't need to copy once the entire copy is done, because I am copying many files. I need to know when each individual file is done. – Mike Cooper – 2009-09-22T19:04:57.627