I've recently noticed that a few companies have begun to offer bot and scraping protection services based on the idea of browser fingerprinting to detect them, and then blocking the specific fingerprint from accessing the site (rather than blocking the IP).
Here are a few examples:
- http://www.distilnetworks.com/
- http://www.fireblade.com/
- http://www.shieldsquare.com/how-it-works.php
There are differences between them, but apparently all of those companies use Javascript to get detailed browser specific fields like plugins, fonts and screen size, and resolution, combine them with what can be obtained from the HTTP headers and use this data to classify the client as bot/human.
My question then is: Is this approach robust enough? How hard would it be for an attacker to spoof all of the data fields that the Javascript client sniffs (plugins, fonts, OS, etc.)? What measure of protection does this approach provide - only against not-very-sophisticated bots, or is it really that hard to overcome?