Bulletproofing downloads of large PDFs against browser misconfiguration?

Question

I have a site running apache whose main purpose in life is to serve up large (10-30 Mb) pdf files. I get emails fairly frequently from users saying that they're having problems downloading the files:

"will start to download, but the download doesn't complete, it freezes at about 25%."
"It seems to find the page, but just spins and spins ... I let it go 5 min. No data. HOWEVER: When I chose "download" I got it in seconds."
"It somehow starts loading the pdf at around 10%, both in Chrome and Firefox."

Since I can't reproduce the problems, and they don't seem to occur for the vast majority of users, it's very hard to figure out what's up. Presumably the users have misconfigured their browsers or plugins somehow, or maybe in some cases it's a usability issue with their browser or plugin. Getting users to report specific error messages or behavior, describe their configuration, etc., is like pulling teeth.

I've seen some other questions that describe similar problems, but they seem specific to IIS, whereas I'm running apache:

This bug doesn't seem to match the (vague) reports I've gotten:

'IE cannot display the webpage' when opening PDFs

Are there any techniques for bulletproofing my setup so that users don't experience so many of these problems? Browser detection in javascript with an appropriate message? Warning users against particular browser/plugin combinations, or automatically detecting those combinations? Right now, I can't even tell which lines to look at in my apache log file to see whether any error is recorded on the server side. Possibly all of this becomes more complicated than you'd expect for serving up a plain old static file, because Adobe Reader tries to be tricky -- although these PDFs are not optimized.

If anyone would like to try to reproduce the error, a pdf for which users have reported problems is here: http://www.lightandmatter.com/sr/sr.pdf [It may now be impossible to reproduce the behavior because I've implemented Håkan Lindqvist's answer.]

I can reproduce the "starting at 10%" in Chrome, but it's purely a display issue and nothing you can prevent. I have a feeling you're on a wild goose chase to figure out the cause as *anything* could happen. Bad Internet connections, faulting routing, client-side proxies, ISP proxies, etc etc. — Nathan C, Jun 24 '14 at 19:22
@NathanC: Interesting. I can't reproduce this with Chromium on Linux. Do you have Chrome set up to use Chrome PDF Viewer? — Ben Crowell, Jun 24 '14 at 19:38
@BenCrowell I did a "save link as" in Chrome (Windows 7). I noticed it started at ~2 MB, but that's probably because it already downloaded that much by the time it displayed it. — Nathan C, Jun 25 '14 at 12:18

score 1 · Accepted Answer · answered Jun 24 '14 at 20:29

To figure out if there is anything you can do to improve this I think you will really want to find out what browser / pdf viewer plugin combination(s) this problem occurs with and try to find a way of reproducing it.

Chrome and Firefox are mentioned in the question but at least Chrome comes with its own pdf viewer. However, it's entirely possible to use the Acrobat Reader plugin or similar with either of these browsers so just knowing the browser doesn't really answer what software was used.

On the other hand, if the goal is to have users simply download the files and you do not want to deal the oddities of various plugins you may want to consider instructing the browser not to open the file but just download it.

This would be done by setting Content-Disposition: attachment in the HTTP response.

Of course, depending on how your users are used to work with these files this could also cause confusion but I imagine just having your browser save a file and then opening it locally should be less prone to error.

Nice idea, have now implemented it on my server. It cuts out the layers of software that were causing problems and makes everything dead simple. — Ben Crowell, Jun 24 '14 at 23:17

score 0 · Answer 2 · edited Apr 13 '17 at 12:14

If you can replicate this, take a look at the download status, request and response headers, it will give you a clue.

**Response Headers**
Accept-Ranges   bytes
Connection  Keep-Alive
Content-Length  9531692
Content-Range   bytes 11278-9542969/9542970
Content-Type    application/pdf
Date    Tue, 24 Jun 2014 21:08:45 GMT
Etag    "1b78005-919d3a-4f550c11dff40"
Keep-Alive  timeout=15, max=100
Last-Modified   Mon, 24 Mar 2014 02:11:33 GMT
Server  Apache/2.2.16 (Debian)


**Request Headers**
Accept  text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection  keep-alive
Host    www.lightandmatter.com
If-Range    "1b78005-919d3a-4f550c11dff40"
Range   bytes=11278-
User-Agent  Mozilla/5.0 (Windows NT 6.2; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0

Download Status - 206 Partial Content. By definition, this means the "Client" made this request, not the other way around.

So the client is requesting, in this case bytes=11278- , the response headers confirms that it is receiving the Accept-Ranges.

~~There is one thing that does confuse me, is that there is a hypen after the digit 8.~~

Originally posted here, but a possible (untested) solution is to add the following to the htaccess file.

# Disable Byte-range for PDF files
<Files *.pdf>
    Header set Accept-Ranges none 
</Files>

`bytes=11278-` means from byte 11278 until the end. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1 If the client requests a specific range it's entirely expected to get status 206 on success. It's not clear that there is a problem in what you have posted. — Håkan Lindqvist, Jun 24 '14 at 21:53
If the status is indeed a success status, i am happy to retract this answer. — Cold T, Jun 24 '14 at 21:56
206 is success, it means that the server is returning part of the requested resource (based on the `Range` request header). I'm not ruling out that a buggy client could trip over itself by requesting the wrong part of a resource compared to what it actually needs or something like that but the request/response above is not obviously bad. It could be that forcing the client to get the whole resource at once instead of doing "clever stuff" may work around problems. — Håkan Lindqvist, Jun 24 '14 at 22:03

Bulletproofing downloads of large PDFs against browser misconfiguration?

2 Answers2