Can a webpage track that I have downloaded the page source/webarchive?

Question

Specifically on Safari. I understand that accessing a webpage via GET is already traceable back to the requester (me), but if I download the page itself, will it send another request or use a specific header to indicate that I have downloaded it?

So just to be clear, any time you hit a webpage (via your terminal or in a browser), you “download” it. Are you referring to one specific way of downloading, or do you mean saving the page and associated files locally after you have downloaded it? — llorrac, Feb 13 '18 at 10:41
@llorrac I'm referring to the specific way of saving a page's webarchive. Would the site's admin know that I'm specifically downloading it that way, or is it indistinguishable from other forms of GET-ing the webpage? — isopach, Feb 13 '18 at 10:52

score 3 · Accepted Answer · answered Feb 13 '18 at 11:44

When a HTTP client wants a resource, it issues a GET request. If it after retrieving the response saves it to disk or just keeps it in memory while it is viewed in the browser is irrelevant. That is not a concern that the HTTP protocol is designed to deal with.

So saving a page for offline viewing shouldn't be easily distinguishable from just visiting it. The browser might fetch some of the resources again if they are not cached localy, but that would look no different to the server than if you had just reloaded the page.

That said, it might be possible to detect a "download" by doing some carefull traffic analysis, looking at timing or whatever. But that would require quite some work and would be highly dependent on what browser you are using, and I find it hard to believe anyone would ever think it was worth the effort.

So no, I wouldn't worry about the owner of the page knowing that I specifically downloaded it as opposed to just viewed it.

score 2 · Answer 2 · answered Feb 13 '18 at 08:39

There are several methods to track the requests, google analytics is one of the kind.

when you download a web page, obviously it can be tracked by the page owner if he want. if you download it from web archive or any proxy, it will be difficult to track as it is from you(your IP), how ever if the page has any analytic scripts etc they will be executed when you download that page from web archive also.

Can a webpage track that I have downloaded the page source/webarchive?

2 Answers2

Linked