84

Google has released a new form of captcha identification of bots, that asks the user to click a single checkbox. It uses image-based verification only if necessary.

Could someone please explain to me as to how such a program differentiates a human from a bot?

There is a program here that can perform mouse clicks on your computer. It can not be detected by a web-based program with no access to your program files. It should be possible to write an undetectable Windows executable that can tick the check box. One could also randomize the response time of the program.

After a few (successful) attempts, the captcha will ask for image verification. Maybe that can be solved by an AI that searches the images using Google Image Search (by image), and makes guesses based on the filenames of 'visually similar' images. If the images used are not from the net, then they would be limited in number, and one could create a database of them.

Could someone clarify whether these approaches could actually work?

ghosts_in_the_code
  • 955
  • 1
  • 6
  • 9

5 Answers5

75

This isn't really a great question for stackexchange as Google is keeping its algorithms secret so all we can really do is make guesses about how it works, but my understanding is that the new system will analyze your activity across all of Google's services (and possibly other sites that Google has some control over, such as websites that have Google ads).

Thus, it is likely that the checks are not limited to just the page that has the checkbox on it. For example, if they detect that your computer/IP address you are using was also used in the past to do things that a normal human would do - things like checking Gmail, searching on Google search, uploading files to Drive, sharing photos, browsing the web etc. - then it can probably be reasonably sure that you are a human and allow you to skip the image verification. On the other hand, if it can't associate your computer with any previous human-like activity, then it would be more suspicious and give you the image verification. Though the mouse behavior as it clicks the checkbox may be one factor it analyzes, there is almost certainly a lot more to it.

Again, we don't know for sure how it works. This is just my best guess based on what little Google has said:

While the new reCAPTCHA API may sound simple, there is a high degree of sophistication behind that modest checkbox. CAPTCHAs have long relied on the inability of robots to solve distorted text. However, our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy. Thus distorted text, on its own, is no longer a dependable test.

To counter this, last year we developed an Advanced Risk Analysis backend for reCAPTCHA that actively considers a user’s entire engagement with the CAPTCHA—before, during, and after—to determine whether that user is a human. This enables us to rely less on typing distorted text and, in turn, offer a better experience for users. We talked about this in our Valentine’s Day post earlier this year.

To me the point about "before, during, and after use" is a strong hint that they analyze previous browsing behavior, but my interpretation could be wrong.

Here's a quote from WIRED:

Instead of depending upon the traditional distorted word test, Google’s “reCaptcha” examines cues every user unwittingly provides: IP addresses and cookies provide evidence that the user is the same friendly human Google remembers from elsewhere on the Web. And Shet says even the tiny movements a user’s mouse makes as it hovers and approaches a checkbox can help reveal an automated bot.

There is another thread on stackoverflow discussing this as well: https://stackoverflow.com/questions/27286232/how-does-new-google-recaptcha-work

As for image verification, you're not going to be able to find those images with reverse image search, or compile a database of them. They are usually random street signs or house numbers captured by Google's Street View cars, or words from books that were scanned for the Google Books project. There is a good purpose behind this - Google actually makes use of what people type into reCaptcha to improve their own databases and train OCR algorithms. reCaptcha gives the same image to a number of users, and if they all agree on what it says, then the picture becomes training data for Google's AI.

From wikipedia:

The reCAPTCHA service supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects.

reCAPTCHA has worked on digitizing the archives of The New York Times and books from Google Books.[3] As of 2012, thirty years of The New York Times had been digitized and the project planned to have completed the remaining years by the end of 2013. The now completed archive of The New York Times can be searched from the New York Times Article Archive, where more than 13 million articles in total have been archived, dating from 1851 to the present day.

tlng05
  • 10,244
  • 1
  • 33
  • 36
  • 2
    Can you provide any sources for your answer? – RoraΖ Jan 09 '15 at 18:15
  • You may be right. I wondered about a possible conflict with their [Privacy Policy](https://www.google.com/intl/en/policies/privacy/) but reading the broad way it is formulated, and specially their _[How we use information we collect](https://www.google.com/intl/en/policies/privacy/#infouse)_, it seems compatible: «We use the information we collect from all of our services to provide, maintain, protect and improve them, to develop new ones, and to protect Google and our users. We also use this information to offer you tailored content». – Ángel Jan 09 '15 at 21:55
  • However, it never blocks you if you clear the image test. (irrespective of previous history) – ghosts_in_the_code May 04 '15 at 06:27
  • Hi! I found this answer really interesting. But if Google is already pretty sure you're a human, why does it bother to display a CAPTCHA at all? – Eli Rose Jan 01 '17 at 19:47
  • 1
    @EliRose A significant part of the reCaptcha implementation is [a server-side check of the widget's security token](https://developers.google.com/recaptcha/docs/verify). The website needs to verify that it's not being spoofed. This happens upon user interaction with the widget. – isherwood Feb 03 '17 at 16:58
  • Yes, it works when I click a single checkbox but when I do the same with incognito mode of my browser, then it doesn't. – Bhushan Jan 27 '19 at 08:49
22

I also use to be amazed by this thing. So, what I did, in Chrome open incognito mode, then browse a site that has the new Google CAPTCHA and tick the box. Well, it didn't get me through, instead it shows a series of images and asked me to select images related to one image.

This shows that Google is constantly tracking our behavior to determine if we are human or not.

Incognito mode

fdiengdoh
  • 356
  • 2
  • 4
  • 2
    Could you explain how this answers the question? Maybe I'm missing something, but I don't see how this addresses the possible attacks that the OP mentions. – S.L. Barth Oct 05 '15 at 10:56
  • 3
    @S.L.Barth: It appears to provide support (using formatting that wouldn't have fit into a comment) for the explanation given by tlng05's answer. – Ben Voigt Oct 05 '15 at 21:28
  • 3
    @BenVoigt yes I was just trying to behave like a machine and see how Google reacts. Deleting cookies, history and cache also triggers the same thing. – fdiengdoh Oct 17 '15 at 18:13
  • 4
    I'm guessing you are in the UK. "Commercial lorry" means nothing to us here in the USA. So even more interesting that google is making it geographically contextual. – richard Mar 29 '19 at 20:33
  • 1
    And a note, _Chrome_ is _also_ a product of Google. – Константин Ван Aug 17 '19 at 15:22
9

When you click on I'm not a robot it sends over an HTTP request to google with the whole bunch of useful information things like

  • Your IP Address
  • Your country
  • Timestamp

Information from your browser such as the way you move your cursor just before entering the checkbox. How you are scrolling the page before the click. The time interval between different browser events and many other variables that google keeps secret.

All these criteria are then processed by machine learning risk analysis at Google and most of the time the information can tell the difference between a human and a bot but if the risk analysis engine is still unsure then the small percent of users often complete an additional challenge.

That's where Image recognition CAPTCHA comes in. If you prove that you are human this way then chances are Google's engine will remember and next time after clicking that checkbox you will be able to pass right through with these.

defalt
  • 6,231
  • 2
  • 22
  • 37
2

As far as I've seen, the logic is like this:

  • If the user is not logged in the Google Account (in the browser) then s/he gets a visible captcha.
  • If the user is logged in, then depending on your previous (probably across google) activity history (either on that page or before you navigated there), there are two possible scenarios:
    1. You will not get any captcha
    2. You will get easier captcha (i.e. 1 maze instead of 4 mazes)

What I can't understand well, is what is the use of checkbox captchas when the algorithm has already detected that you are a human.

schroeder
  • 123,438
  • 55
  • 284
  • 319
T.Todua
  • 2,677
  • 4
  • 19
  • 28
0

It does several things. It checks your IP address and cookies. It looks at how you click and your mouse moves before you click. Using an auto click tool usaly makes google give you a picture thing.

TheJulyPlot
  • 7,669
  • 6
  • 30
  • 44
skyler
  • 1