3

The problem:

I manage a website which has lot's of dynamically generated pages. Every day bots from Google, Yahoo and other search engines download like 100K+ of pages. And sometimes i have problems with "hackers" trying to massively download the whole website.

I would like to block ip addresses of "hackers" while keeping search engine bots crawling the pages. What is the best way to do this ?

Note:
Right now i am solving the problem as follows. I save ip of each page request to a file every X seconds. And i have a crontab script which counts repetitive ip's every 30 minutes. For ip's which are repeated too many times, the script checks a hostname- if it doesn't belong to Google/Yahoo/Bing/etc then we have a candidate for banning. But i don't really like my solution and think that auto banning could be done better or using some out of the box solution.

Termos
  • 215
  • 2
  • 8

3 Answers3

4

You didn't state your OS, so I will happily tell you the OpenBSD version: in pf.conf place something like the following in your ruleset (for 100 conns per 10 secs max):

table <bad_hosts> persist
block quick from <bad_hosts>
pass in on $ext_if proto tcp to $webserver port www keep state \
                 (max-src-conn-rate 100/10, overload <bad_hosts> flush global)

you could add a whitelist and a cron job kicking addresses from bad_hosts after a day or two.

knitti
  • 700
  • 6
  • 9
0

I would have thought fail2ban is the answer.

You can use white lists to stop search engines getting blocked.

Richard Holloway
  • 7,256
  • 2
  • 24
  • 30
0

Have a look at Simple Event Correlator. It can automatically run commands (ie. add a block to iptables) after a certain amount of lines matching a regular expression have been seen within a window of time. It can also define a "context" that expires. When a context expires you can unblock the IP in question (ie. remove from iptables).

Ztyx
  • 1,365
  • 3
  • 13
  • 27