10

Captchas are everywhere. Suppose you're a big company offering Captchas on many sites and you can easily track a user across the web simply by setting a cookie everytime some site loads your captcha engine.

But apart from that, I am wondering if it might even be possible to identify a person by analyzing the way they solve a Captcha.

I am pretty sure that Google's image-based reCaptcha actually analyzes the tiny movements of the cursor while the user is clicking the tiles, how long they take to click adjacent tiles, how the mouse then moves to the "confirm" button etc. On repeated times, I found that solving a Captcha too fast and efficiently will result in me being prompted by more and more Captchas. However, if I play dumb, linger over the tiles and move my mouse a bit randomly like my grandma would, the Captcha gets accepted after the first try. (Which is extremely annoying, by the way.)

So, if we go a step further, it might even be plausible that each person has their own unique characteristics to the way they move their mouse, look at pictures, click fast or slow. Let's call it a cursor fingerprint. By feeding this into a neural network, could it be possible to identify a person?

dervonnebenaan
  • 365
  • 2
  • 7
  • There are a variety of different captchas which differ a lot in the amount of user interaction needed to solve the captcha. And even reCaptcha has changed a lot in the past and will probably change again in the future. Therefore I consider the question in the current form too broad. – Steffen Ullrich Jan 28 '18 at 05:21
  • But, even if you consider only the current version of reCaptcha the ability to identify somebody depends a lot on how much data you could collect for each users and how many users need to be distinguished. Based on this I don't believe that you cannot identify a user solely on how he solves the captcha. But even if you can not identify a single user in all cases you might be able to limit it to a subset of all known users. This subset might be further limited by combining captcha solving pattern with different information. – Steffen Ullrich Jan 28 '18 at 05:24
  • 1
    In theory, this might give a few bits of deanonymization when combined with other techniques, but in practice, I think it's just way too much work for the amount of bits acquired. – Lie Ryan Jan 28 '18 at 05:29

1 Answers1

9

This is absolutely possible. Whether or not reCaptcha itself or any other given captcha service does this, I don't know, but biometrics based on mouse movements are absolutely able to uniquely identify people. The same is true for many other ways we interact with our computers (e.g. our keyboards).

There are a large number of research papers on various forms of biometrics. A few examples:

  • An Efficient User Verification System via Mouse Movements

    Our technique is robust across different operating platforms, and no specialized hardware is required. The efficacy of our approach is validated through a series of experiments. Our experimental results show that the proposed system can verify a user in an accurate and timely manner, and induced system overhead is minor.

  • On Using Mouse Movements as a Biometric

    Two authentication schemes are proposed, one for initial login of users and another for passively monitoring a computer for suspicious usage patterns. Error rates for both schemes were calculated and compared to prior work.

  • User Identity Verification via Mouse Dynamics

    The proposed algorithm outperforms current state-of-the-art methods by achieving higher verification accuracy while reducing the response time of the system.

You'll note that these papers are all about using biometrics for user authentication rather than for tracking. Both authentication and tracking are similar, as they both involve trying to determine which identity (if any) a given user is among a set of known identities. Whether or not this scales to the size of Google's user base, and whether or not it is accurate enough to track previously unknown users, I do not know. The point of this answer is to emphasize the capabilities of biometrics as a technology.

forest
  • 64,616
  • 20
  • 206
  • 257
  • 1
    Interesting reading! I think there is a subtle difference between the question and your answer here (or at least the first reference, didn't get through the rest). In the first reference, they have tried to answer a yes or no question: Is this the mouse movement of user X? What Google would need to do to identify users by their mouse movement is much, much harder. They would need to answer "Out of our billions of users, who is this?" – Anders Jan 29 '18 at 08:41
  • Still, nice with an answer linking to some research, and as I said I didn't read it all so I might have missed something here. – Anders Jan 29 '18 at 08:42
  • The problems are orthogonal. Identifying one out of a billion users for tracking purposes is similar to identifying whether or not a user trying to log in is the correct one out of a billion possibilities for authentication purposes. The papers give the numbers. – forest Jan 29 '18 at 23:39
  • 1
    Inferring user identity and verifying a user's identity claim are definitely not the same problem. Examples of widespread verification methods that can't be used for identity inference are (1) passwords and (2) confirmation links sent via email or text message. – Ben Voigt Feb 24 '18 at 22:43
  • @BenVoigt Passwords are chosen by a user and have very little information. You can't chose your own biometrics. – forest Feb 25 '18 at 01:12