How can I use fail2ban to block scrapers?

Question

I have a media site and problems of users coming along and scraping all of the content.I placed a invisible URL on the page to catch spiders that immediately blocks the ip, but some people have figured out the URL scheme and are creating their own scripts.

All the fail2ban filters I have seen thus far deal with failed login attempts, but I desire one that will be more advanced and will detect, then rate-limit and/or block abusers. The urls the scrapers use are all valid, so if they go slow enough, I won't be able to tell, but I imagine I can keep the amateurs out through fail2ban.

How can I implement this filter in fail2ban the right way while minimizing my false positives on legit users?

score 1 · Accepted Answer · answered Jun 13 '11 at 06:33

1

I'm not really sure fail2ban is the right tool here; you might want to look at something like mod_security (http://www.modsecurity.org/). You'll be able to track requests from a session or ip context, define rules to describe suspect traffic, and then deny/slow it accordingly.

EDIT: You didn't specify, so I'm just assuming that you're using Apache.

answered Jun 13 '11 at 06:33

MrTuttle

1,166
5
5

The qst is tagged apache, so that's a safe bet – Mark Henderson Jun 13 '11 at 06:46
It seems like mod_security is great for catching random scraping attacks, but these media scrapers are all hitting valid urls that cannot be blocked. Am I missing something? – coneybeare Jun 13 '11 at 14:02
@Mark Henderson good point; @coneybeare how do you know for certain that these are scrapers? agent string? traffic pattern? you just need to write a rule to define that -- ie: the same ip hits n-links within x-amount of time, or hits your content without a referring URL – MrTuttle Jun 13 '11 at 14:40

How can I use fail2ban to block scrapers?

1 Answers1