0

I noticed I can save a webpage as .webarchive file from my iPhone Safari while I’m offline. Basically what I did was to open the website, log in, go to the specific page, then when it’s finished loading, I turned off my internet connection. Then while the internet is completely off I tried

  1. Airdrop the page as webarchive to another device,
  2. Save to Files as webarchive

Both of which worked. I made sure I can open those webarchive files offline as well.

Can the website owner know/track I’ve downloaded/saved their webpage as webarchive even if I did the “download“ part without the internet connection? One of the concerns is that I am registered to their website and I have to log in to view the content (wondering if that somehow increases chance of exposure).

UndercoverDog
  • 612
  • 2
  • 17
Tocchi
  • 3
  • 3

2 Answers2

2

When you make an HTTP request to a website, you already download the webpage and related files, that's how it works. So the files are already on your iPhone. So no, they can't know if you saved it as a .webarchive file.

luisschwab
  • 54
  • 6
  • 2
    Does this also apply if I’m saving online OR just when I’m offline? Any differences? I think someone said it can be tracked via some analytic scripts but I'd assume that implies being online. https://security.stackexchange.com/a/179686/282634 – Tocchi Sep 10 '22 at 19:30
  • 1
    Well, if you're offline there's no client-server communication, so it couldn't happen. And when online, I don't think it would make sense to make another GET request do download the page, seeing you already have all the files from the first GET request downloaded. If you were to use cURL or wget, maybe this timing approach could be used to detect it, but I find it unlikely (assuming you changed the User-Agent header). – luisschwab Sep 10 '22 at 21:32
  • Okay, thanks a lot for your answer! – Tocchi Sep 10 '22 at 21:58
  • Please accept it as an anwser if you're satisfied with it. – luisschwab Sep 11 '22 at 11:52
0

Can the website owner know/track I’ve downloaded/saved their webpage as webarchive even if I did the “download“ part without the internet connection?

An HTML file can reference remote content in its source. If there is a image, script, CSS file, etc. that points to a remote resource, when your local .webarchive is loaded it will make a remote call. Even if all of the scripts and resources are downloaded or inline, if you have a script that makes a remote request (AJAX/XHR), it could still be sent.


I did a quick test with a live website that includes something like:

<script src="http://some.site/script.js></script>

I then saved the document as a .webarchive via Safari. .webarchive format is binary, but you can still open it in Safari and then view source. When I view source, the <script /> tag still points to the remote resource.

Therefore, if you open the .webarchive file the browser will load the remote resource specified in the "src".

You can validate this as well with network capture in the browser or at the network level.


There are offline archiving tools that will download all resources, but keep in mind even if you have a local copy of a script, this will not magically rewrite or disable remote calls that might be in the script (fetch() etc).

If you want to create an offline copy with nothing that could be sent back, I would recommend save as an PDF or copy the content you want to Word/etc. instead since then its just text and images.

Otherwise, if you do need some scripts to run, I would run them in a browser running in a sandbox with no network access. If the network access is really necessary for functionality, you are out of luck, but it may be enough for your needs.

--

Bottom line: They won't know you saved the document, but they might see network traffic when you go to open the document.

Eric G
  • 9,691
  • 4
  • 31
  • 58