Downloading only images using curl or wget?

2

1

UPDATED:

I've found using this Bash script fixes the problem of having GIF files with a .jpg extension.


I'm attempting to download images from a forum who's url uses the following format:

http://www.someforum.com/attachment.php&id=XXX

I wrote a bash script that uses wget to retrieve these images:

for i in {1..10}

do
    wget --accept .jpg,.jpeg --cookies=on --load-cookies=cookies.txt -p "http://www.someforum.com/attachment.php&id=${i}" -O "image${i}.jpg"

done

It works and downloads the images. However if there isn't an image it still downloads the resulting HTML and stuffs it in XX.jpg.

Curl does the same:

for i in {1..10}

do
    curl --cookie cookies.txt "http://www.someforum.com/attachment.php&id=${i}" -o "image${i}.jpg"

done

Is there anyway to reject results that are not /image/*? Right now I am assuming that the images are jpeg, it would be nice to detect the MIME/TYPE and use the appropriate filename.

Finally, wget is giving 500 response codes when an image isn't found, if I can filter 200 response codes this may yield a solution.

Bash, Ruby, Python answers are acceptable.

Ashley

Posted 2012-03-18T00:27:49.000

Reputation: 163

Answers

4

wget returns a non-zero exit code on error; it specifically sets exit status == 8 if the remote issued a 4xx or 5xx status. So, you can modify your bash loop to unlink the file if wget doesn't exit with success:

for i in {1..10}
do
    wget --accept .jpg,.jpeg --cookies=on --load-cookies=cookies.txt -p "http://www.someforum.com/attachment.php&id=${i}" -O "image${i}.jpg" || rm "image${i}.jpg"
done

Similarly, curl has a --fail option, with which it wont save the file and returns exit status 22 when the http status is >= 400.

dbenhur

Posted 2012-03-18T00:27:49.000

Reputation: 231

This works, thank you. I'm going to leave the question open for a while to see if anybody comments regarding MIME/Content-Type as I'm still (understandably) getting GIF files saved as .jpg – Ashley – 2012-03-18T11:18:20.767

1Ah, I didn't realize you had files saved with wrong extension problem too. I would have suggested file --mime to map the appropriate real extension too. – dbenhur – 2012-03-18T17:43:23.737

I was hoping to avoid post processing but having looked through the entire wget manual it seems as though it's the only way. – Ashley – 2012-03-18T22:57:24.547