Using wget and manually passing Range header

4

1

I would like to download a range of a file which I have explicitly defined. As far as I know:

wget --header="Range: bytes=1024-2048" http://www.example.com/file.tmp

should run well. Yet, it fails to do so with the following error when debug mode is on,

Registered socket 300 for persistent reuse.
Disabling further reuse of socket 300.
Closed fd 300

Why does it even gives that error and retries and how I can fix it?

The following are the actual full logs of the process.

Manually assigned resumable download

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
Setting --server-response (serverresponse) to 1
Setting --page-requisites (pagerequisites) to 1
Setting --recursive (recursive) to 1
Setting --tries (tries) to 1
Setting --header (header) to Range: bytes=10024-
DEBUG output created by Wget 1.11.4 on Windows-MinGW.

Enqueuing http://www.example.com/file.tmp at depth 0
Queue count 1, maxcount 1.
Dequeuing http://www.example.com/file.tmp at depth 0
Queue count 0, maxcount 1.
--2012-01-11 07:02:46--  http:/www.example.com/file.tmp
www.example.com çözümleniyor... seconds 0,00, 127.0.0.1
Caching www.example.com => 127.0.0.1
www.example.com[127.0.0.1]:80 bağlanılıyor... seconds 0,00, bağlantı
kuruldu.
Created socket 300.
Releasing 0x0036a108 (new refcount 1).

---request begin---
GET /file.tmp HTTP/1.0
User-Agent: Wget/1.11.4
Accept: */*
Host: www.example.com
Connection: Keep-Alive
Range: bytes=10024-

---request end---
HTTP isteği gönderildi, yanıt bekleniyor...
---response begin---
HTTP/1.1 206 Partial Content
Server: nginx/0.7.65
Date: Wed, 11 Jan 2012 05:03:57 GMT
Content-Type: application/vnd.ms-powerpoint
Content-Length: 37651672
Last-Modified: Tue, 01 Nov 2011 21:18:50 GMT
Connection: keep-alive
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000
Content-Range: bytes 10024-37661695/37661696

---response end---

  HTTP/1.1 206 Partial Content
  Server: nginx/0.7.65
  Date: Wed, 11 Jan 2012 05:03:57 GMT
  Content-Type: application/vnd.ms-powerpoint
  Content-Length: 37651672
  Last-Modified: Tue, 01 Nov 2011 21:18:50 GMT
  Connection: keep-alive
  Expires: Thu, 31 Dec 2037 23:55:55 GMT
  Cache-Control: max-age=315360000
  Content-Range: bytes 10024-37661695/37661696
  Registered socket 300 for persistent reuse.
  Disabling further reuse of socket 300.
  Closed fd 300
  Vazgeçiliyor.

Wget supported resumable download (command: -c)

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
Setting --server-response (serverresponse) to 1
Setting --continue (continue) to 1
Setting --http-keep-alive (httpkeepalive) to 1
DEBUG output created by Wget 1.11.4 on Windows-MinGW.

--2012-01-11 07:12:51--  http://www.example.com/file.tmp
www.example.com çözümleniyor... seconds 0,00, 127.0.0.1
Caching www.example.com => 127.0.0.1
www.example.com[127.0.0.1]:80 bağlanılıyor... seconds 0,00, bağlantı
kuruldu.
Created socket 300.
Releasing 0x0003a0b0 (new refcount 1).

---request begin---
GET /file.tmp HTTP/1.0
Range: bytes=557172-
User-Agent: Wget/1.11.4
Accept: */*
Host: www.example.com
Connection: Keep-Alive

---request end---
HTTP isteği gönderildi, yanıt bekleniyor...
---response begin---
HTTP/1.1 206 Partial Content
Server: nginx/0.7.65
Date: Wed, 11 Jan 2012 05:14:01 GMT
Content-Type: application/vnd.ms-powerpoint
Content-Length: 37104524
Last-Modified: Tue, 01 Nov 2011 21:18:50 GMT
Connection: keep-alive
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000
Content-Range: bytes 557172-37661695/37661696

---response end---

  HTTP/1.1 206 Partial Content
  Server: nginx/0.7.65
  Date: Wed, 11 Jan 2012 05:14:01 GMT
  Content-Type: application/vnd.ms-powerpoint
  Content-Length: 37104524
  Last-Modified: Tue, 01 Nov 2011 21:18:50 GMT
  Connection: keep-alive
  Expires: Thu, 31 Dec 2037 23:55:55 GMT
  Cache-Control: max-age=315360000
  Content-Range: bytes 557172-37661695/37661696
Registered socket 300 for persistent reuse.
Uzunluk: 37661696 (36M), 37104524 (35M) kalan [application/vnd.ms-powerpoint]
Saving to: `file.tmp'

 1% [                                       ] 622.314      149K/s              ^

Umur Kontacı

Posted 2012-01-11T05:16:45.537

Reputation: 363

The logs say 10024- whereas your command is 1024-2048 for the range - is that what happens or is there a typo? – Paul – 2012-01-11T05:59:35.593

Have you checked if the used server supports requesting byte ranges at all? – Robert – 2012-01-11T19:04:07.307

@Robert of course. if you look at the log of normal continuous download procedure, you can see it. – Umur Kontacı – 2012-01-12T03:36:37.670

@Paul, I have changed the command I wrote here; the actual was 10024-, yet I have tried many ranges, I guess the problem is in somewhere else – Umur Kontacı – 2012-01-12T03:38:20.820

Why is wget setting the HTTP protocol to 1.0 (instead of 1.1) for a range request? – None – 2013-01-19T02:13:49.747

Answers

7

Depending on how you look at it, this is either a bug or a missing feature.

The headers specified with the --header only get sent by Wget, but they don't get interpreted.

In src/http.c of the tarball of Wget 1.13.4, there's a sanity check for partial content:

  if ((contrange != 0 && contrange != hs->restval)
      || (H_PARTIAL (statcode) && !contrange))
    {
      /* The Range request was somehow misunderstood by the server.
         Bail out.  */
      xfree_null (type);
      CLOSE_INVALIDATE (sock);
      xfree (head);
      return RANGEERR;
    }

The if condition covers two cases:

  • If there's a content range set, it must coincide with the number of missing bits of the file that's being downloaded.

  • If partial content is being sent, content range must be specified by the server.

The second case doesn't cause any problems, since Wget interprets the server response well. The first, however, does, since Wget didn't interpret the client-specified range.

To solve this problem, if you're willing to compile your own version of Wget, you can change the above source code to the following:

  if (H_PARTIAL (statcode) && !contrange)
  {
      xfree_null (type);
      CLOSE_INVALIDATE (sock);
      xfree (head);
      return RANGEERR;
  }
  if (contrange != 0 && contrange != hs->restval)
    hs->restval = contrange;

Now, Wget will deduct the number of missing bits of the file from the content range.

You could also give cURL a try. It has a built-in --range switch.

Dennis

Posted 2012-01-11T05:16:45.537

Reputation: 42 934

took some time to figure your answer. I haven't actually compiled my version for that but makes sense. cURL handles it better already. Thanks. – Umur Kontacı – 2013-01-19T12:22:51.337

Thanks for pointing me to cURL! I completely forgot about it, works like a charm! Switch in question is --continue-at <offset>. Cheers! – mr.b – 2013-03-02T23:53:07.083