0

I have some web crawlers, and a specific website seems to be blocking traffic temporarily after some time. The thing is, even though all clients have the same external IP address (they access the internet via the same gateway) it blocks specific machines from my network. That is, it's not a simple IP address block.

How can this happen? What kind of rule can be made (on a web server, or firewall, etc.) to get this behavior? Can the rules be based on the MAC address or another machine-specific data?

Doug
  • 239
  • 2
  • 6
  • [Look](http://img2.wikia.nocookie.net/__cb20140302230327/cardfight/images/1/1a/Come_to_the_DarkSide_cookies.jpg). Just as one option. – Sven Dec 02 '14 at 17:42
  • Cookies are not an issue. They are not persistent between sessions and the block occurs always after several sessions. But re-check if this behavior is implemented correctly is probably a good idea. – Doug Dec 02 '14 at 18:20
  • User-Agent string? – Oleg Dec 03 '14 at 12:23
  • All the HTTP headers are identical regardless of the machine. – Doug Dec 03 '14 at 16:04

1 Answers1

3

This is, perhaps, a bit too obvious of a statement, but the blocking behavior has to be based on the information that the remote server is privy to. That would include:

  • The source IP address

  • The source TCP port, which should be ephemeral and changing on each request

  • There's probably some passive fingerprinting of the client's IP stack that's possible

  • The URL of the requested resource

  • The content of the HTTP request header, including cookies, the User-Agent string, and fingerprinting the "Accept:" header

If you're bringing Javascript into the mix then there's all kinds of fingerprinting of the client that could be performed.

You mention the MAC address, and it's worth pointing out that the MAC doesn't leave the local network. The edge router is the only device, along the way, that would receive the MAC address of the client computer.

I'd tend to suspect they're using the User-Agent string and, perhaps, fingerprinting the "Accept:" header.

Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
  • I suspected that MAC address wouldn't leave the LAN, but wasn't sure. Javascript is not involved, and the software at least should send the same headers regardless of the machine it runs on. I think I may inspect them to be sure. – Doug Dec 02 '14 at 18:13
  • The TCP port can be the reason. I found out that multiple using keep-alive ends up using the same TCP connection. I disabled keep-alive to see what happens. – Doug Dec 03 '14 at 16:03