How to download all images from a website using wget?

2

1

I want you to help me with wget, I want to download all images from stock images website like https://pixabay.com/ but when I enter the code in the terminal nothing is downloading, no jpg, no zip.

I used this code:

wget -r -A jpg https://pixabay.com/

I sometimes use jpg or zip depending on the website. I have tried with more websites:

http://www.freepik.com/
http://www.freeimages.com/
http://all-free-download.com/

It's not downloading at all.

ali haider

Posted 2017-06-15T07:16:38.690

Reputation: 21

Answers

2

First of all, it seems they don't want you to download their pictures. Please consider this while acting.

Technically you would be able to download the pictures using custom tags/attributes. You can check their custom attributes downloading the html source. Unfortunately wget (yet) doesn't support arbitrary custom tags. Basically you have two options:

  1. Extend wget with this feature as suggested at https://unix.stackexchange.com/questions/258835/wget-follow-custom-url-attributes
  2. Download the source and write your own post processor.

In the second case, you have to download the index file and extract the image url-s. You need to keep in mind they don't want you to use wget, so they forbid it's user agent string. You have to fake something eg. mozilla. If you are on Linux something like this would list you the pictures:

wget -O --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"    "https://pixabay.com/en/photos/?q=cats&hp=&image_type=&cat=&min_width=&min_height=" | grep -o 'https://cdn.pixabay[^" ]*'

You just have to feed it back into wget and you are done.

..... | xargs wget

edit: @vera's solution is also nice, however it seems to downloading fraction of pictures in case of image search. [sorry not enough points to comment :)]

Gote Guru

Posted 2017-06-15T07:16:38.690

Reputation: 131

1

Here is the working command:

wget -U "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0" -nd -r --level=1  -e robots=off -A jpg,jpeg -H http://pixabay.com/
  • -U "..." : The website is returning HTTP error 403 (forbidden) as it only allows a given list of User-Agent to access their pages. You have to stipulate an User-Agent of a common browser (firefox, chrome, ...). The one I gave you is a working example.
  • -nd (no-directories) from man: "Do not create a hierarchy of directory when retrieving recursively."
  • -e robots=off: do not follow robot.txt exclusion
  • -H: enable retrieving files across hosts (here pixabay.com and cdn.pixabay.com are considered as different hosts)

if there is some rate limit mechanism, add the following option --wait 1

vera

Posted 2017-06-15T07:16:38.690

Reputation: 760