I mirrored a ecommerce site using wget. This site seems to use Cloudflare to handle their web traffic.
What's interesting is that after 90 % or so of the mirroring was done, wget started to produce/receive a lot of error messages. I then tried to open the site in a regular browser but was greeted with a 403 error and a message from Cloudflare "The request was blocked". Ok, fair enough, they probably don't want people to download 1.5 million pages from them (which is what I had done at the time).
However
- When I use Tor Browser on the same machine I run wget on to access the same site I get the same error message.
- When I access the same site using my second computer (both machines are connected to the same WiFi) in both a regular browser and the Tor browser, it works fine.
Has Cloudflare somehow managed to fingerprint the machine I run wget on in way that makes it possible for them to also identify my machine through Tor? How much information does wget reveal when it connects to a web server?
That hardware is a quite common Macbook Pro 15" so nothing extraordinary there.
Tor browser is running using its default settings.