download file via http only if changed since last update

20

6

I need to download a file from a HTTP server, but only if it changed since the last time I downloaded it (e.g. via the If-Modified-Since header). I also need to use a custom name for the file on my disk.

What tool can I use for this task on linux?


wget -N cannot be used because -N cannot be used with -O.

cweiske

Posted 2015-04-30T19:55:29.060

Reputation: 1 010

Why not download the file and then rename it? – Julian Knight – 2015-04-30T22:50:57.490

.. because the tool still needs to be able to check if the HTTP resource changed since the last download? This will be hard if the file has been renamed and thus does not exist anymore at the place the tool expects it. – cweiske – 2015-05-01T07:33:14.367

Sorry, I rushed that comment, see my answer. – Julian Knight – 2015-05-01T08:05:40.483

Answers

26

Consider using curl instead of wget:

curl -o "$file" -z "$file" "$uri"

man curl says:

-z/--time-cond <date expression>

(HTTP/FTP) Request a file that has been modified later than the given time and date, or one that has been modified before that time. The date expression can be all sorts of date strings or if it doesn't match any internal ones, it tries to get the time from a given file name instead.

If $file doesn't necessarily pre-exist, you'll need to make the use of the -z flag conditional, using test -e "$file":

if test -e "$file"
then zflag="-z '$file'"
else zflag=
fi
curl -o "$file" $zflag "$uri"

(Note that we don't quote the expansion of $zflag here, as we want it to undergo splitting to 0 or 2 tokens).

If your shell supports arrays (e.g. Bash), then we have a safer and cleaner version:

if test -e "$file"
then zflag=(-z "$file")
else zflag=()
fi
curl -o "$file" "${zflag[@]}" "$uri"

Toby Speight

Posted 2015-04-30T19:55:29.060

Reputation: 4 090

7

The wget switch -N only gets the file if it has changed so a possible approach would be to use the simple -N switch which will get the file if it needs to but leaves it with the wrong name. Then create a hard link using the ln -P command to link it to a "file" with the correct name. The linked file has the same metadata as the original.

The only limitation being that you cannot have hard links across file system boundaries.

Julian Knight

Posted 2015-04-30T19:55:29.060

Reputation: 13 389

For many purposes, a symbolic link may be adequate - unless inode identity actually matters for the asker. – Toby Speight – 2017-03-29T10:03:11.753

1wget is the better tool for this job. It checks timestamp AND the file size, which curl (7.38.0) doesn't. Also, wget terminates with non-0 on 4xx/5xx, whereas curl doesn't really care about server-codes by default. – schieferstapel – 2017-05-16T22:26:36.267

4

Python 3.5+ script for wrapping curl command:

import argparse
import pathlib

from subprocess import run
from itertools import chain

parser = argparse.ArgumentParser()
parser.add_argument('url')
parser.add_argument('filename', type=pathlib.Path)
args = parser.parse_args()

run(chain(
    ('curl', '-s', args.url),
    ('-o', str(args.filename)),
    ('-z', str(args.filename)) if args.filename.exists() else (),
))

sirex

Posted 2015-04-30T19:55:29.060

Reputation: 141

This is awesome! TIL chain :) – John Oxley – 2017-05-02T06:37:45.300

1

A similar approach to "date check" (with "curl --time-cond"), would be to download according to file size comparison, i.e. Download only if the local file has a different size than the remote file.

It is useful for example, when the download process failed in the middle, and thus the local downloaded file gets a newer date than the remote file, but it's actually corrupted, and re-downloading is required:

local_file_size=$([[ -f ${FILE_NAME} ]] && wc -c < ${FILE_NAME} || echo "0")
remote_file_size=$(curl -sI ${FILE_URL} | awk '/Content-Length/ { print $2 }' | tr -d '\r' )

if [[ "$local_file_size" -ne "$remote_file_size" ]]; then
    curl -o ${FILE_NAME} ${FILE_URL}
fi

The "curl -z / --time-cond" option (that was suggested in another answer) will not download the remote file in this case (cause the local file has a newer date), but this "size check" script will!

Noam Manos

Posted 2015-04-30T19:55:29.060

Reputation: 771