wget - only getting .listing file in every sub-dir



if I use the command

wget --no-remove-listing -P ...../debugdir/gnu/<dir>/ ftp:<ftp-site>/gnu/<dir>/

I will get the .listing file of that directory. But I have to step through each subsequent sub-directories to get the whole structure. Is there a way to get the .listing file from all (sub)directories with one command?

Also, I have noticed that the file index.html is automatically generated after every access. Is there a way to suppress this behavior?

The thing is that I always found Bash processing slow, but after some profiling I found that the largest delay is in getting each .listing file from subsequent sub-directories.

Example: checking for specific file extensions in the GNU tree takes about 320 seconds of which 290 seconds are for processing the above wget command.


Posted 2012-05-11T21:22:35.853

Reputation: 41



If you are looking to build an index of a FTP site, that is, to list all of the subdirectories and files on the site without actually retrieving them, you can do this:

wget -r -x --no-remove-listing --spider ftp://ftp.example.com/


  • -r => recursive (i.e, visit subdirectories)
  • -x => force mirror subdirectories to be created on client
  • --no-remove-listing => leave ".listing" files in each subdirectory
  • --spider => visit but do not retrieve files

This will create a sparse directory tree of identical structure on the client as the server, containing only ".listing" files showing the contents (the result of "ls -l") for each directory. If you want to digest that into a single list of path-qualified file names (like you would get from "find . -type f"), then do this at the root of that sparse directory tree:

find . -type f -exec dos2unix {} \;
( find . -maxdepth 999 -name .listing -exec \
awk '$1 !~ /^d/ {C="date +\"%Y-%m-%d %H:%M:%S\" -d \"" $6 " " $7 " " $8 "\""; \
C | getline D; printf "%s\t%12d\t%s%s\n", D, $5, gensub(/[^/]*$/,"","g",FILENAME), $9}' \
{} \; 2>/dev/null ) | sort -k4

which will give you output like

2000-09-27 00:00:00       261149    ./README
2000-08-31 00:00:00       727040    ./foo.txt
2000-10-02 00:00:00      1031115    ./subdir/bar.txt
2000-11-02 00:00:00      1440830    ./anotherdir/blat.txt

NB: the "-maxdepth 999" option is not necessary in this use case, I left it in the invocation that I was testing that had an additional constraint: to limit the depth of the tree that was reported. For example, if you scan a site that contains full source trees for several projects, like


then you might only want an outline of the projects and top level directories. In this case, you would give an option like "-maxdepth 2".


Posted 2012-05-11T21:22:35.853

Reputation: 265