2

I mirrored a ecommerce site using wget. This site seems to use Cloudflare to handle their web traffic.

What's interesting is that after 90 % or so of the mirroring was done, wget started to produce/receive a lot of error messages. I then tried to open the site in a regular browser but was greeted with a 403 error and a message from Cloudflare "The request was blocked". Ok, fair enough, they probably don't want people to download 1.5 million pages from them (which is what I had done at the time).

However

  1. When I use Tor Browser on the same machine I run wget on to access the same site I get the same error message.
  2. When I access the same site using my second computer (both machines are connected to the same WiFi) in both a regular browser and the Tor browser, it works fine.

Has Cloudflare somehow managed to fingerprint the machine I run wget on in way that makes it possible for them to also identify my machine through Tor? How much information does wget reveal when it connects to a web server?

That hardware is a quite common Macbook Pro 15" so nothing extraordinary there.

Tor browser is running using its default settings.

hensti
  • 151
  • 3
  • 2
    I'd guess you got hit with a temporary throttle. I believe wget will only reveal your IP address + any data you send through it. – Nate Aug 30 '18 at 19:26
  • Yes, that is what confuses me. How can Cloudflare detect that Tor browser and wget are run from the same machine. – hensti Aug 31 '18 at 23:12

2 Answers2

2

Cloudflare is notoriously unfriendly towards Tor users. Most Cloudflare hosted sites becomes quite patchy when accessed through Tor as Cloudflare rates Tor users as high risk users.

It's possible your scraping or the site admin triggered "I'm under attack" mode, which increases Cloudflare's vigilance while it's active.

Lie Ryan
  • 31,089
  • 6
  • 68
  • 93
1

Are you certain that your Wget and Tor browser errors are actually codependent? What have you done to exclude this possibility?

Wget

Wget sends a GET request with your IP and your user-agent string. Unless overrided, the default form is:

User-Agent: Wget/version (os)

User-Agent: Wget/1.19.5 (linux-gnu)

Wget doesn't support JavaScript which is the vector for hardware-based fingerprinting.

Fingerprinting

Cloudflare may have temporarily blocked your non-Tor browser based on your shared IP with a flagged web scrapper. Future blocks are for some reason applied based on this first browser fingerprint and not IP. The block does not affect your second device because its fingerprint doesn't match the initial one. This is merely a theory. It's a clumsy system but it would explain why your second device is unaffected.

There is no reliable way they could tie your Wget client to your Tor Browser without an intermediate step. As an experiment, try getting blocked again but, immediately after, try connecting with Tor Browser and not a normal browser with the same IP as the one used for Wget.

I suspect it's just a bad co-incidence but if your concern is absolutely critical, there is a way to be sure. Check the HTTP requests sent out from all five clients for what information they sent to the target server.

In the unlikely event they do have hardware-based detection that is effective against Tor users, it's probably based on javascript fingerprinting between your non-Tor browser and Tor Browser. Wget isn't culpable beyond flagging your IP for fingerprint collection.

You will likely need to decrypt your HTTPS traffic to do this. I've never done this with requests made through Tor but it can be done.

Inerva
  • 43
  • 7