The problem:
I manage a website which has lot's of dynamically generated pages. Every day bots from Google, Yahoo and other search engines download like 100K+ of pages. And sometimes i have problems with "hackers" trying to massively download the whole website.
I would like to block ip addresses of "hackers" while keeping search engine bots crawling the pages. What is the best way to do this ?
Note:
Right now i am solving the problem as follows. I save ip of each page request to a file every X seconds. And i have a crontab script which counts repetitive ip's every 30 minutes. For ip's which are repeated too many times, the script checks a hostname- if it doesn't belong to Google/Yahoo/Bing/etc then we have a candidate for banning.
But i don't really like my solution and think that auto banning could be done better or using some out of the box solution.