How to ensure the most recent version of a file is completely downloaded while minimizing network transfer

2

I'm looking for a shell command that robustly ensures a file is completely downloaded, but that avoids redownloading anything unnecessarily. Here's pseudocode of what I'm hoping for:

If file doesn't exist, download it.
If file exists:
    use HTTP HEAD to get timestamp and size of remote file.
    if remote timestamp is newer, delete local file and download remote file
    if timestamps are the same:
        if remote size is greater than local size:
            resume download
        if remote size is equal to local size:
            do nothing
        if remote size is less than local size:
            do nothing but issue a warning because this is weird

wget is able to resume interrupted transfers using the -c option, but I have to track somewhere that it got interrupted so I can know to pass in that option, and it requires running the command over again.

wget -N makes sure to avoid downloading a file if the timestamp of the remote file is not newer than the local one. But it doesn't know if transfers were interrupted and thus won't do anything when called again on an interrupted transfer.

curl -C - will download a file if not present and resume it if only partially downloaded. But if a completely downloaded file is present, it gives me errors about the server not supporting byte ranges.

I suppose I could write something to implement my pseudocode myself, but this feels like it would be a common enough desire that I ask, is there an existing way of doing this?

Josh Hansen

Posted 2017-02-03T18:11:15.760

Reputation: 123

Answers

2

You can use the -c flag of wget even if you haven't started downloading the file. This should work for your purpose:

while ! wget -qc $url; do :; done

This is an infinite loop, until wget exits with success. If the file is partially downloaded, the loop continues, and wget continues where it left off.

janos

Posted 2017-02-03T18:11:15.760

Reputation: 2 449

1This seems to work nicely. Not sure if it respects the timestamps as I outlined, but it's basically what I need. – Josh Hansen – 2017-02-10T22:05:34.847