2
1
UPDATED:
I've found using this Bash script fixes the problem of having GIF files with a .jpg extension.
I'm attempting to download images from a forum who's url uses the following format:
http://www.someforum.com/attachment.php&id=XXX
I wrote a bash script that uses wget
to retrieve these images:
for i in {1..10}
do
wget --accept .jpg,.jpeg --cookies=on --load-cookies=cookies.txt -p "http://www.someforum.com/attachment.php&id=${i}" -O "image${i}.jpg"
done
It works and downloads the images. However if there isn't an image it still downloads the resulting HTML and stuffs it in XX.jpg
.
Curl does the same:
for i in {1..10}
do
curl --cookie cookies.txt "http://www.someforum.com/attachment.php&id=${i}" -o "image${i}.jpg"
done
Is there anyway to reject results that are not /image/*
? Right now I am assuming that the images are jpeg, it would be nice to detect the MIME/TYPE and use the appropriate filename.
Finally, wget is giving 500 response codes when an image isn't found, if I can filter 200 response codes this may yield a solution.
Bash, Ruby, Python answers are acceptable.
This works, thank you. I'm going to leave the question open for a while to see if anybody comments regarding MIME/Content-Type as I'm still (understandably) getting
GIF
files saved as.jpg
– Ashley – 2012-03-18T11:18:20.7671Ah, I didn't realize you had files saved with wrong extension problem too. I would have suggested
file --mime
to map the appropriate real extension too. – dbenhur – 2012-03-18T17:43:23.737I was hoping to avoid post processing but having looked through the entire wget manual it seems as though it's the only way. – Ashley – 2012-03-18T22:57:24.547