If the scrapers are BOTS and not humans, you could try creating a honeypot directory that they would crawl to and be blocked (by IP address) automatically via a "default page" script in that directory. Humans could easily unblock themselves, but it would thwart bots as they would get a 403 "not authorized" error on any further access. I use a technique like this to block bad robots that disobey robots.txt, but not permanently block humans who either share the same IP or "accidentally" navigate to the blocking script. That way, if a shared IP gets blocked, it's not permanent. Here's how:
I set up a default (scripted) page in one or more subdirectories (folders) blocked in robots.txt. That page, if loaded by a misbehaving robot -- or a snooping human -- adds their IP address to a blocked list. But I have a 403 ("not authorized") error-handler that redirects these blocked IPs to a page explaing what's going on and containing a captcha that a human can use to unblock the IP. That way, if an IP is blocked because one person used it one time for a bad purpose, the next person to get that IP won't be permanently blocked -- just inconvenienced a little. Of course, if a particular IP keeps getting RE-blocked a lot, I can take further steps manually to address that.
Here is the logic:
- If IP not blocked, allow access normally.
- If visitor navigates to forbidden area, block their IP.
- If IP is blocked, redirect all access to the "unblock" form containing the captcha.
- If user manually enters proper captcha, remove the IP from the blocked list (and log that fact).
- Rinse, lather, REPEAT above steps for further accesses.
That's it! One script file to handle the block notice and unblock captcha submission. One entry (minimum) in the robots.txt file. One 403 redirection in the htaccess file.