10

I'm having an issue with a certain individual who keeps scraping my site in an aggressive manner; wasting bandwidth and CPU resources. I've already implemented a system which tails my web server access logs, adds each new IP to a database, keeps track of the number of requests made from that IP, and then, if the same IP goes over a certain threshold of requests within a certain time period, it's blocked via iptables. It may sound elaborate, but as far as I know, there exists no pre-made solution designed to limit a certain IP to a certain amount of bandwidth/requests.

This works fine for most crawlers, but an extremely persistent individual is getting a new IP from his/her ISP pool each time they're blocked. I would like to block the ISP entirely, but don't know how to go about it.

Doing a whois on a few sample IPs, I can see that they all share the same "netname", "mnt-by", and "origin/AS". Is there a way I can query the ARIN/RIPE database for all subnets using the same mnt-by/AS/netname? If not, how else could I go about getting every IP belonging to this ISP?

Thanks.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • 1
    Have you considered that the perpetrator might be using compromised machines rather than getting a new IP address each time? – John Gardeniers Jun 15 '10 at 01:13
  • Does CloudFlare offer options for limiting bandwidth by user/IP? I haven't used them but I thought they did. That would be the easiest way, imo, just use a service to do the whole thing for you. – markspace Apr 04 '17 at 18:23

5 Answers5

7

whois [IP address] (or whois -a [IP Address]) will usually give you a CIDR mask or an address range that belongs to the company/provider in question, but parsing the results is left as an exercise for the reader (there are at least 2 common whois output formats).

Note that such wholesale blocking can also potentially knock out legitimate users. Before taking this approach you should contact the abuse desk at the ISP in question (usually listed in the whois information for their netblock or DNS domain, otherwise abuse@ is a good place to start) to see if the situation can be resolved diplomatically rather than technically.


Also note that there are some pre-made solutions to limit requests per second by IP - Check out mod-qos or your system's firewall/traffic shaping capibilities.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • I know the whois output gives you an address range, but this ISP seems to have ranges all over the place. e.g. (these aren't the actual addresses by the way) the spider will come from 46.84.*.*, then 88.98.*.* and so on. There's no obvious pattern other than what was noted in my question (same AS and maintainer in whois). Contacting their abuse department will result in the emails being sent straight to /dev/null. It's a Chinese ISP. As for mod-qos? Limiting request per second is useless. The spider isn't THAT aggressive. I Can't see any obvious way to do what I want through iptables either. –  Jun 14 '10 at 18:51
6

Figured it out on my own. Sort of.

robtex.com lists all announced IP ranges for a given AS at: http://www.robtex.com/as/as123.html#bgp

Still don't know how or where robtex retrieves this info from. If someone else wants to chime in and explain where the data comes from, that would be great.

2

You can use Hurricane Electric's BGP Service.

If you have an IP address and want to know all address blocks registered to the same ASN, do this:

  1. Go to https://bgp.he.net and search for the IP address to get its ASN.
  2. Search for the ASN.
  3. View tables "Prefixes v4" and "Prefixes v6" for all address blocks.
Kontrollfreak
  • 376
  • 3
  • 8
2

Since you have access to iptables, I will assume you have a root access on the system anyway. In this case, I would suggest instlling Fail2Ban which will just block an IP (for a certain time you decide) if they try to abuse a service (HTTP, DNS, Mail, SSH ..etc) by hitting the service port as N times within X period. (all users decided.)

I am using that on my server and I am getting very good results. specially with those chinease hackers wanting to hit into my SSH.

hit my home page for more information. I have a blog post all about fail2ban.

Aly Badawy
  • 21
  • 1
-2

You can try this tool. It is not fast, but working.

Khaled
  • 35,688
  • 8
  • 69
  • 98
Nick
  • 1
  • 1
    Welcome to Server Fault! Please read our [faq] in particular [May I promote products or websites I am affiliated with here?](http://serverfault.com/faq#promotion). – user9517 Jan 23 '13 at 08:16
  • 1
    That site is broken. But just curious what makes you to come to the conclusion that the op is promoting his product. How will you differentiate sharing a genuine useful information and self made product promotion. – Talespin_Kit Nov 27 '16 at 19:22