Download the directory structure of a website without its files?

2

I googled for this but couldn't find an example of how to use eg. wget to download the directory structure of a website without downloading any of its files.

At this point, I just need to understand how a website is organized. I'll mirror the site later.

OverTheRainbow

Posted 2016-04-06T09:25:35.433

Reputation: 5 153

Do you mean the directory structure as in where the images, css, and js are stored? – Paul – 2016-04-06T09:48:22.220

Yes, the directory structure, eg. /dir1, /dir2, etc. – OverTheRainbow – 2016-04-06T09:55:13.863

This isn't clear. Do you mean the directory structure of the assets that make up the site, such as where the images and css etc files reside, or do you mean the tree structure that describes how the pages link together (which is not related to directory structure). What for example would be in "dir1" in this case? – Paul – 2016-04-06T13:28:03.737

This looks very relevant to web crawl the list of web pages on a particular web site. http://stackoverflow.com/questions/857653/get-a-list-of-urls-from-a-site

– simpleb – 2016-04-06T14:32:07.600

Answers

2

At the command prompt type:

wget -r --spider www.your-website.com

Alternate command with an option to specify the maximum depth level depth.

wget -r --spider -l depth www.your-website.com

Recursive retrieval options:

-r  
--recursive
    Turn on recursive retrieving.    The default maximum depth is 5.

--spider
    Don't download anything

-l depth
--level=depth
    Specify recursion maximum depth level depth.

karel

Posted 2016-04-06T09:25:35.433

Reputation: 11 374

but this does download the files, too. – xtofl – 2018-10-08T07:52:28.070

The --spider option of wget doesn't download anything except for creating the empty directory structure of the URL without downloading any files into the directories. – karel – 2018-10-08T09:50:14.193

Somehow, I read over that argument. My bad. – xtofl – 2018-10-09T06:49:11.933