4

I'm working on a web app at the moment where we’re trying to remove any user friction and excess steps from our user creation page. To avoid computer bots completely spamming our app we need to have a some sort of CAPTCHA but this would take away from the main idea behind the app which is simplicity.

When looking through the suggestions of alternatives here on Stack Exchange they all seem to still take away from the UX and leave room for bots to work around. Suggestions here:

This got me thinking, why can't we detect whether a user is real or a bot using a series of anti-bot methods behind the scenes such as honeypots, time-stamps, rogue POST and GET requests, whether JavaScript is enabled, checking HTTP header and others.

Is there a technical reason why this hasn't been done? Thinking about writing up a library myself otherwise :).

LogiKal
  • 43
  • 4
  • 1
    The technical reason could be that when I know what technology you use to detect my bot, I can reprogram it so that it doesn't trigger its detection method. – Philipp Aug 27 '14 at 21:26
  • @Philipp Good point, however it could be possible for some tests to have changing variables. For example, labels and field names in the honeypots could randomly change on initialization. I agree that there will be some tests which spammers will inevitably adapt to but I would like to think that with the combination of many dynamic tests the system could hold up. – LogiKal Aug 27 '14 at 22:04
  • 3
    Someone else on here suggested making a question and a field the precise same color as the background. A bot, which doesn't rely on visual input to detect a field, will answer the question, but a user won't. It's easy to reprogram a bot to avoid this trick, but doesn't add any steps for the user. – KnightOfNi Aug 27 '14 at 22:58
  • You need a mix of testing the human and background detection, and you should be able to ramp up or react based on the likelihood of it not being human. I went to a live talk where these guys had developed a really good system, let me see if they posted it online somewhere and get back to you. – Eric G Aug 28 '14 at 01:58

2 Answers2

2

CAPTCHA are not necessarily needed nor or they the only solution. Defense in depth means using multiple tactics, layers, and techniques. If you have information you can try to determine the likelihood that the client talking to your web app is (A) a human and (B) a human you want accessing your site.

There are reputation services out there that can provide blacklists based on reports or peer-customers, your upstream network providers or CDN providers may offer such services as well. You may choose to block access from certain known bad ranges, bad or unlikely countries of origin, etc.

You can tune your response based on the risk, cost, etc. You can also manually adjust your metrics over time depending on expected number of customers versus customers that do not sign up, repeat ip addresses, tracking to email invite codes, etc.

A while back I saw a good presentation on "Repsheet" where the presenter talked about all of the behavioral analysis and data points his company uses to identify likely fraudsters and how they used multiple techniques to address it. The slides were more accompaniment rather than stand alone, but you should still be able to follow the techniques. Some components of the system are available at the repsheet github repo.

Eric G
  • 9,691
  • 4
  • 31
  • 58
  • Repsheet seems to fit the bill with their approach of combining different tests/techniques. There's a lot of info I can work on top of. – LogiKal Aug 28 '14 at 21:07
1

The answer overall is: you can distinguish between human and spambots based on the spambot developers' willingness to attack you specifically. There are two categories of defences essentially: bot/human distinction and methods to increase the cost of carrying out attacks.

Distinguishing bots from humans

A lot of alternative techniques to CAPTCHAs in the first family have been proposed on Stack Exchange already: Is there a true alternative to using CAPTCHA images?

I won't discuss CUPTCHUs and CIPTCHIs and other "usable" CAPTCHAs. They all require that humans perform a task meant to discriminate them against bots. Most of them can probably be broken if an adversary actually targets them specifically, and they all still require wasting your user's time -- one of them at least cares about UX and is tolerable in some use contexts where playfulness is a good value to have for the experience you're crafting.

Increasing the cost of attacks

My personal favourite is simply to use federated identity schemes so that you rely on other identity providers to confirm whether a specific IP has an account with them -- OAuth does that -- and that this account has some significant amount of value (for an email account, it has received substantial amounts of mail from other accounts assumed to be real*) -- nobody does that yet to my knowledge.

Note that such an approach provides no protection against infected devices rather than spambots, something which is a growing concern for sites like Facebook where some real, trusted accounts are being abused because of malicious browser extensions and start serving spam (no URL, that's academic hearsay).

Other methods may be good to take, if you have empirical data showing that it works for whoever attacks you. You can reduce the number of accounts created per IP per e.g. week/month before serving a CAPTCHA if you are facing spambots that reuse the same botnet IPs to create accounts instead of systematically changing. You can do machine learning on the existing spambots' details (form of nickname, details filled) to identify a recurrent offender and use that as an extra filter for deciding whether to CAPTCHA or not. Of course that requires a lot of maintenance and only works against unmotivated adversaries, so if you're a million-user platform you're probably out of luck. Active adversaries trivially defeat such instances of machine learning (see AISec conferences).

CAPTCHAs are a no no, in any case

If you believe CAPTCHAs to be absolutely necessary for some cases, you should still implement other methods to detect offenders and only serve CAPTCHAs when you have doubts over an account. It's much likelier that a real user fais a CAPTCHA than a spambot succeeds at 4 or 5 different checks.

* relying on chains of trust opens up for sybil attacks, but these seem to me much easier to defend against (SybilGuard and whatever else has been published ever since) than e.g. direct automated comment/review detections and much nicer on the user than using CAPTCHAs (with up to 40% failure rates according to usability researcher Angela Sasse).

Steve Dodier-Lazaro
  • 6,798
  • 29
  • 45
  • I like the idea of two distinct categories of defenses. There will always be a way for spam bots to work around defenses but we can employ methods as you mentioned to increase the cost and make the attack un-worthwhile. I agree that the success of a spambot's attack will largely depend on their willingness but do you think having a series of adapting tests could hold up well? – LogiKal Aug 27 '14 at 23:30
  • I've never faced a motivated adversary, but have very rarely done Web management. On my personal site we used to get spam in blog comments, and since we rolled a new version of the site it stopped, the only difference being that we use custom forms in an iframe (I did *not* code that, I promise) and that seems to be confusing our bunch of spammers because we haven't had a single spam. We previously used Disqus. Which leads to... – Steve Dodier-Lazaro Aug 27 '14 at 23:56
  • My answer is based on discussion with Web security researchers, and on what I witness in my (somewhat privileged) position as a student in a security usability research group: we have students doing projects on CAPTCHA replacement, RFCs from industry... the rest is just discussing ideas. – Steve Dodier-Lazaro Aug 27 '14 at 23:58