Why does `wget` download index.html instead of a direct file?

5

I'm just trying to download this, but it always redirect to the main page and in the end just download the index.html file, not the file I'm trying to download:

http://tweaking.com/files/setups/tweaking.com_windows_repair_aio.zip

Do you guys know how to download it correctly? I used --user-agent="firefox+linux, IE+windows, (anything you can think of)" but it doesn't work.

This is the output, is the same with --user-agent enabled:

jaheaga@jaheaga:~$ wget  http://www.tweaking.com/files/setups /tweaking.com_windows_repair_aio.zip--2012-04-13 19:40:07--  http://www.tweaking.com/files/setups/tweaking.com_windows_repair_aio.zip
Resolviendo www.tweaking.com... 199.119.100.39
Conectando con www.tweaking.com[199.119.100.39]:80... conectado.
Petición HTTP enviada, esperando respuesta... 302 Found
Ubicación: http://tweaking.com [siguiente]
--2012-04-13 19:40:08--  http://tweaking.com/
Resolviendo tweaking.com... 199.119.100.39
Reutilizando la conexión con www.tweaking.com:80.
Petición HTTP enviada, esperando respuesta... 302 Moved Temporarily
Ubicación: http://www.tweaking.com [siguiente]
--2012-04-13 19:40:08--  http://www.tweaking.com/
Reutilizando la conexión con www.tweaking.com:80.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: no especificado [text/html]
Grabando a: “tweaking.com_windows_repair_aio.zip.1”

    [ <=>                                                                            ]     46.913       234K/s   en 0,2s    

2012-04-13 19:40:09 (234 KB/s) - “tweaking.com_windows_repair_aio.zip.1” guardado [46913]

Jaheaga

Posted 2012-04-13T23:25:49.873

Reputation: 51

1What errors do you get? – Nifle – 2012-04-13T23:31:35.377

The link is not working at all. At least, for me. How about uploading it to somewhere? And use the direct link from there? – Apache – 2012-04-14T00:00:10.743

it gives me the main page, but go to http://tweaking.com/files/setups/ and you can check it, weird behavior of that link

– Jaheaga – 2012-04-14T00:18:47.450

BTW: I am curious. What's the reason for downloading the file with wget instead of inside the browser? I mean you definitely used a browser to find the download url :) – zpea – 2012-04-14T00:30:55.657

duty noted, is a batch script I use to fix really broken windows computers. – Jaheaga – 2012-04-14T00:38:33.993

Ah ok, makes sense. And the question looks much better now, thanks. (Removed my comment). – zpea – 2012-04-15T02:26:55.257

In case you would like to have English output next time: Just execute export LANGUAGE=en_US:en LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 once before all the other commands, and they put out everything in (US) English with US number/date formats etc. (This setting is only for the current shell (and subshells) and everything is back to normal, when you close it and/or open another one) – zpea – 2012-04-15T02:39:44.210

Answers

10

The user-agent is a good start, but not sufficient in that case. Another HTTP header value that is often checked for is 'Referer' [sic!]. See Wikipedia: HTTP Referer.

wget has a --referer=url option to specify the referring page. Analysing the traffic for a successful download in Wireshark shows that it used following request from a testing system of mine:

GET /files/setups/tweaking.com_windows_repair_aio.zip HTTP/1.1
Host: www.tweaking.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:11.0) Gecko/20100101 Firefox/11.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://www.tweaking.com/content/page/windows_repair_all_in_one.html

For this case it even seems you don't need to fake an User-Agent.

wget --referer=http://www.tweaking.com/content/page/windows_repair_all_in_one.html  http://www.tweaking.com/files/setups/tweaking.com_windows_repair_aio.zip

Does the trick.

zpea

Posted 2012-04-13T23:25:49.873

Reputation: 1 363