Download ALL Folders, SubFolders, and Files using Wget

21

17

I have been using Wget, and I have run across an issue. I have a site,that has several folders and subfolders within the site. I need to download all of the contents within each folder and subfolder. I have tried several methods using Wget, and when i check the completion, all I can see in the folders are an "index" file. I can click on the index file, and it will take me to the files, but i need the actual files.

does anyone have a command for Wget that i have overlooked, or is there another program i could use to get all of this information?

site example:

www.mysite.com/Pictures/ within the Pictures DIr, there are several folders.....

www.mysite.com/Pictures/Accounting/

www.mysite.com/Pictures/Managers/North America/California/JoeUser.jpg

I need all files, folders, etc.....

Horrid Henry

Posted 2013-10-07T16:05:46.357

Reputation: 231

1

Have you read the documentation for wget, specifically for using it recursively?

– Moses – 2013-10-07T16:38:38.933

There's also an article in the documentation here that seems relevant.

– Moses – 2013-10-07T16:39:28.093

Answers

38

I want to assume you've not tried this:

wget -r --no-parent http://www.mysite.com/Pictures/

or to retrieve the content, without downloading the "index.html" files:

wget -r --no-parent --reject "index.html*" http://www.mysite.com/Pictures/

Reference: Using wget to recursively fetch a directory with arbitrary files in it

Felix Imafidon

Posted 2013-10-07T16:05:46.357

Reputation: 526

I use the similar command but only getting an index.html file! – shenkwen – 2019-06-25T20:55:05.787

1Thanks, I have run that command several times, but i did not let the command finish all the way to the end. I got side tracked, and let the command actually finish, and it copied ALL Folders First, then it went back and copied ALL of the files into the folder. – Horrid Henry – 2013-10-07T16:46:22.183

just goes to show you, if i had patience, i would have had this done 2 weeks ago.... LOL. :) thanks again. – Horrid Henry – 2013-10-07T16:47:01.187

@Horrid Henry, Congratulations! – Felix Imafidon – 2013-10-07T17:02:13.667

20

I use wget -rkpN -e robots=off http://www.example.com/

-r means recursively

-k means convert links. So links on the webpage will be localhost instead of example.com/bla

-p means get all webpage resources so obtain images and javascript files to make website work properly.

-N is to retrieve timestamps so if local files are newer than files on remote website skip them.

-e is a flag option it needs to be there for the robots=off to work.

robots=off means ignore robots file.

I also had -c in this command so if they connection dropped if would continue where it left off from when i re-run the command. I figured -N would go well with -c

Tim Jonas

Posted 2013-10-07T16:05:46.357

Reputation: 506

Could you add a couple of sentences to your answer to explain what these the parameter settings do? – fixer1234 – 2014-12-20T09:47:28.123

sorry. sure ill add them now – Tim Jonas – 2014-12-20T10:27:45.133

I have updated my answer – Tim Jonas – 2014-12-20T10:36:35.853

Thanks. So should -c be part of your command example or added optionally after an incomplete download? Also, the -e is so that the command takes precedence over any that may be in .wgetrc? And is that a typo for -r (recursive vs. reclusive)? – fixer1234 – 2014-12-20T18:27:53.660

Yes that is correct. Yes -e will execute command as if it were a part of .wgetrc I added it there as robots=off did not seem to work without it there. – Tim Jonas – 2014-12-23T11:18:42.263

1

wget -m -A * -pk -e robots=off www.mysite.com/ this will download all type of files locally and point to them from the html file
and it will ignore robots file

Abdalla Mohamed Aly Ibrahim

Posted 2013-10-07T16:05:46.357

Reputation: 341