6

I'm downloading files from a remote directory recursively using wget and whoever created the folders and files, used special characters such as è or Ó, when I download a single specificating the full path+filename, the file is downloaded with it's name correctly, but when I just try to download the folder with all the files and directories using the option -r the filenames are not encoded or decoded correctly.

From what I've gathered, the filename gets sent as an ascii in the request, and both my machine and the server have UTF-8 encoding on the $PATH, so it should not be an issue with it either.

When the wget creates the file, the è character (which I will be using as an example) in the filename is saved as a \350 in octal character code, and it appears as a è. And this only happens when I download the files recursively, if I download this file using the complete URL the filename appears correctly.

I've spent a fair amount of hours looking up Q/A here and there, and I've tried everything I've seen, from setting --local-encoding and --remote-encoding to UTF-8, using the --restrict-file-names=nocontrol, etc.

The ports 21 and 22 are closed, so I can't download the files through SCP or FTP, most likely any other protocol to download the files will give the same error, but I'm open to any not common ones that I could use.

Also the main problem I have with this, is that when I download the files, when I try to copy them to a backup folder, some of the files sometimes give me an error of file not found due to the filename being messed up, for now im using the --restrict-file-names=ascii and keeping the names in ascii as a workaround, but I need to change the encoding to UTF-8, also I can't install on the machine any applications such as convmv (orders from the boss).

This is the command I've been using to download the files: wget --keep-session-cookies --cookies=on --no-check-certificate --restrict-file-names=nocontrol --convert-links --no-parent -r <URL>

This is how the file name is saved with downloading a single file vs all files recursively:

OT14-004 CEIP Pins del Vallès.vsd

OT14-004 CEIP Pins del Vallès.vsd

I'm using a machine with this distro CentOS Linux release 7.0.1406 (Core) and with this version of wget GNU Wget 1.14 built on linux-gnu

Nagarz
  • 61
  • 1
  • 3
  • 1
    This seems to be a [known issue](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=411290) in wget. You can try the alternatives listed [here](http://askubuntu.com/questions/233882/how-to-download-link-with-unicode-using-wget). – ngn Jul 17 '15 at 14:34
  • 1
    The solutions there are using `curl` with as far as I know does not works recursively, and `--restrict-file-names=nocontrol`, which as I have mentioned I already tried. – Nagarz Jul 17 '15 at 16:48

0 Answers0