Can the website owner know/track I’ve downloaded/saved their webpage as webarchive even if I did the “download“ part without the internet connection?
An HTML file can reference remote content in its source. If there is a image, script, CSS file, etc. that points to a remote resource, when your local .webarchive
is loaded it will make a remote call. Even if all of the scripts and resources are downloaded or inline, if you have a script that makes a remote request (AJAX/XHR), it could still be sent.
I did a quick test with a live website that includes something like:
<script src="http://some.site/script.js></script>
I then saved the document as a .webarchive
via Safari. .webarchive
format is binary, but you can still open it in Safari and then view source. When I view source, the <script />
tag still points to the remote resource.
Therefore, if you open the .webarchive
file the browser will load the remote resource specified in the "src".
You can validate this as well with network capture in the browser or at the network level.
There are offline archiving tools that will download all resources, but keep in mind even if you have a local copy of a script, this will not magically rewrite or disable remote calls that might be in the script (fetch()
etc).
If you want to create an offline copy with nothing that could be sent back, I would recommend save as an PDF or copy the content you want to Word/etc. instead since then its just text and images.
Otherwise, if you do need some scripts to run, I would run them in a browser running in a sandbox with no network access. If the network access is really necessary for functionality, you are out of luck, but it may be enough for your needs.
--
Bottom line: They won't know you saved the document, but they might see network traffic when you go to open the document.