FTP download list of absolute paths

5

1

I have a list of (a couple thousand) absolute paths to files on a remote server that I need to download to my PC.

I also need the files to keep the directory structure intact for those files.

Example:

/* UNIX Server File-System  */
/home/username/
    stuff/
    more-stuff/
    data/
    file1.txt

/* Local Windows File-System After Transfer  */
C:\Users\username\Documents\home\username\
    stuff\
    more-stuff\
    data\
    file1.txt

Ideally, I would use some type of FTP to get those files to my PC. However, I am unaware of a program or CLI command that supports getting a list of files. I need to get specific files from specific directories, I can't just download whole directories.

My Question: How can I use a list of absolute paths to automatically download the files to my localhost? (while keeping the directory structure intact)

Additionally, I have these files in a PHP array. So it is possible for me to export the list as JSON, CSV, XML, etc.

Nicholas Summers

Posted 2016-01-21T19:38:34.400

Reputation: 203

1Can you use rsync? – Hastur – 2016-01-28T16:20:49.250

Answers

3

If you mind you can use rsync with something like

rsync -av --files-from=/path/yourlist.txt / remote:/backup

where in

  • /path/yourlist.txt you can put your list of files with the full path
  • / The path to add to the filename in your list (If they are full pathname /)
  • remote:/backup the remote host name and its relative path

You can read more searching for --files-from from the man rsync [1]

--files-from=FILE

Using this option allows you to specify the exact list of files to transfer (as
read from the specified FILE or - for standard input). It also tweaks the 
default  behavior of rsync to make transferring just the  specified files and 
directories  easier:
  • The --relative (-R) option is implied, which preserves the path information that is specified for each item in the file (use --no-relative or --no-R if you want to turn that off).
  • The --dirs (-d) option is implied, which will create directories specified in the list on the destination rather than noisily skipping them (use --no-dirs or --no-d if you want to turn that off).
  • The --archive (-a) option’s behavior does not imply --recursive (-r), so specify it explicitly, if you want it.
  • These side-effects change the default state of rsync, so the position of the --files-from option on the command-line has no bearing on how other options are parsed (e.g. -a works the same before or after --files-from, as does --no-R and all other options).

... in the man page there is more...

Hastur

Posted 2016-01-21T19:38:34.400

Reputation: 15 043

This is exactly what I was looking for! – Nicholas Summers – 2016-02-01T16:30:51.117

Give a good look to the help... there are many interesting options e.g. to avoid to download again the same files, etc. etc... – Hastur – 2016-02-01T16:41:06.257

Definitely the best solution. – MariusMatutiae – 2016-02-02T07:26:05.270

5

wget has the functionality you are looking for. From the manpage:

-i file
       --input-file=file
           Read URLs from a local or external file.  If - is specified as file, URLs are read from the standard input.  (Use ./- to read from a file literally named -.)

In other words: wget -i filelist.txt.

filelist doesn't necessarily have to be txt either, as wget supports html via the --force-html switch. If all you have is a list of directories/files, you can set the base URL on command line with the --base switch.

Jarmund

Posted 2016-01-21T19:38:34.400

Reputation: 5 155

he said ftp, can you give an ftp example with wget? – barlop – 2016-01-26T14:30:12.143

@barlop The syntax is the same regardless of protocol. The only difference would be the contents of the filelist.txt, referencing the ftp protocol with the ftp:// prefix – Jarmund – 2016-01-26T14:49:06.250

and the user/pass, it's worth including the url eg "ftp://username:password@blah.xyz" – barlop – 2016-01-28T02:47:34.920

This does not solve the problem fully, the OP stated ...while keeping the directory structure intact. By using wget you are placing all files in the same directory. – MariusMatutiae – 2016-02-01T05:01:58.690

2

However, I am unaware of a program or CLI command that supports getting a list of files.

I am not sure why this should be a stumbling block. You can run ftp with a script as its source,

cd Target_Directory
ftp -v -s:script.ftp ftp.myhost.net 1> /absolute/path/to/my/logfile 2>&1

where Target_Directory is where you want to place the files about to be downloaded, and script.ftp is a script file like this:

USER MyUserId
MyPassword
cd SOURCE_DIR
binary
prompt n
mget the_first_file_I_need
mget the_second_file_I_need
bye

This is fine for a single site. How about many sites? You can create a script file, call it script_main, with the following lines:

cd Target_Directory_1
ftp -v -s:script_1.ftp ftp.myhost_1.net 1>> /absolute/path/to/my/logfile 2>>&1
cd Target_Directory_2
ftp -v -s:script_2.ftp ftp.myhost_2.net 1>> /absolute/path/to/my/logfile 2>>&1
....

and so on. You can prepare the scrip_N.ftp files by parsing the information you have into properly separated files.

MariusMatutiae

Posted 2016-01-21T19:38:34.400

Reputation: 41 321

2

aria2 might be a possibility.

From the documentation :

-d, --dir=

The directory to store the downloaded file.

-i, --input-file=

Downloads the URIs listed in FILE. You can specify multiple sources for a single entity by putting multiple URIs on a single line separated by the TAB character. Additionally, options can be specified after each URI line. Option lines must start with one or more white space characters (SPACE or TAB) and must only contain one option per line. Input files can use gzip compression.

This would require to have a separate input-file per directory.

harrymc

Posted 2016-01-21T19:38:34.400

Reputation: 306 093