Why bother validating the hostname for a Google Recaptcha response?

Question

Google's Recaptcha has hostname validation "baked-in". When a user submits a Recpatcha response, the domain from which the response was acquired is validated against the whitelist of domains you provided when you setup the Recaptcha.

However, if you're using Recaptcha with multiple domains you have the option of disabling Google's default hostname validation and handling it yourself (https://developers.google.com/recaptcha/docs/domain_validation).

Google accompanies this with a prominent warning that not validating the hostname for any given response opens you up to a security vulnerability. But considering how easy it is to spoof the hostname, I don't see how this ever provided any degree of security enhancement.

A simple test proved to me just how easy it is to spoof the hostname value that Google uses to validate the origin of the Recaptcha response:

$ sudo nano /etc/hosts
127.0.0.1    spoofedhostname.com

And then when I sent a test Recaptcha response the result I got back was as follows:

{
    "success": true,
    "challenge_ts": "2016-12-24T14:15:22Z",
    "hostname": "spoofedhostname.com"
}

So Why Bother With Hostname Validation At All?

Hostname validation is largely known to be useless considering how easy it is to spoof.
This seems to have something to do with preventing an attacker from stealing your Recaptcha public key and then generating a bunch of valid Recaptcha responses, which they could store and then use when automating an attack on sensitive endpoints (/login, /reset-password). Theoretically, this could be used in some sort of brute force attack, but it doesn't really make sense considering the response tokens expire after 1 min. And you would still need to manually solve all the Recaptchas, which you could simply do on the actual domain. And, again, they could easily just spoof your domain even if you are doing hostname validation.

It just doesn't make any sense to me, but considering it's a Google product, I have to think that their security engineers know something that I don't.

What am I missing?

is there a public key? google said its a shared key between domain and recaptcha. — Rahul Choudhary, Feb 08 '17 at 07:11
I would argue that its another check they built in early, perhaps from a shared library and refactoring to remove it is unnecessary (what's the risk or harm in leaving it?). Potentially it could also be used as a way to disable the reCaptcha process at a different point, not so much for a security control but for cache control for example — Purefan, Feb 09 '17 at 15:58
@purefan google does cite it as a major security hole if you dont validate the hostname. I'm not 100% sure why that is though. I have some theories but nothing particularly sound — d0nut, Feb 09 '17 at 18:18

André Borie · Accepted Answer · 2017-04-16T02:45:12.043

It may have something to do with people embedding your captchas on a site they set up, and using the solved captchas to spam your site.

For example, set up a site and give something for free (pirated movies/software, porn, etc) but ask for the captcha. Internally this is actually your captcha, and any solved captcha is passed down to a spambot targeting your site. This gives an attacker cost-efficient access to human captcha solving compared to the conventional captcha farms.

The hostname validation would prevent the captcha's JS from loading on an unauthorized site.

Update: I've recently implemented a demo of bypassing this by rendering the captcha on the original site in a headless browser and then using Websocket magic to stream it on my "bait" site (in this case a simple URL shortener that asks for the captcha before redirecting to the target site). This required considerable amounts of RAM (each Firefox instance was about 500MB) compared to the equivalent of rendering the captcha directly on the bait site, so this hostname verification feature is definitely a major pain for spammers.

Aces. Thanks on behalf of everyone for chasing this down @André Borie. Fascinating where it lead. — AJB, Jul 12 '17 at 09:54
Are you talking about v2 or v1? Because v2 will send requests to Google when you click the checkbox and when you select images in the puzzle and click Done. And these requests are made by Javascript, so executed on the bait site, so even if you manage to display the captcha on the bait site, JavaScript can see the bait hostname and verify that isn’t in the list of allowed domains for such site key. — Marco Marsala, Nov 08 '19 at 14:18

score 2 · Answer 2 · answered Feb 09 '17 at 21:25

There are two keys. The Site Key and the Secret Key. Both of these are given to the web admin when setting up reCAPTCHA.

For client side integration, they will give you the api.js and snippet, and site key to insert on the website.

"When your users submit the form where you integrated reCAPTCHA, you'll get as part of the payload a string with the name "g-recaptcha-response". In order to check whether Google has verified that user, send a GET request with these parameters:" <- so if I'm spoofing the url, your server never gets the g-recaptcha-response. The secret is never sent, the value of the 'g-recaptcha-response' is never sent, and the remote ip is never sent.

I'm missing how the hacker is obtaining the Secret Key from the web server. That would only ever be sent directly to google.

------ Extra explanation of what I'm seeing.

The Site Key is easily seen in the html code on your site. However the Secret Key stored on your server can not be accessed or spoofed. I don't know how the hacker would be gaining access to the secret key to complete the two way authentication with google.

I think this is what you are missing. ( the two way authentication )

Refrences: https://www.youtube.com/watch?v=Fvt1S0nBmwQ (video on setting up google reCAPTCHA.

https://developers.google.com/recaptcha/docs/verify

Also: google reCAPTCHA exploits, there are some very interesting insecurities involving embedding someone else's reCAPTCHA in a website to automatically authenticate when they click anywhere on a page.

(reputation is to low to post more than 2 links)

Worwin, there's no need for anyone to gain access to the secret key. This is about intentionally disabling `hostname` validation of the Recaptcha `siteKey` that you've created (via the Google Recaptcha Admin Console) in order to use the same `siteKey` for any number of sites. The use case is when you're building a hosted CMSaaS and you're handling Recaptcha validation via your API layer for `n` domains on your system. — AJB, Feb 10 '17 at 08:08
I'm not entirely sure why this is getting up-votes, it doesn't answer the question asked. — AJB, Feb 15 '17 at 03:37

score -1 · Answer 3 · edited Aug 24 '18 at 11:11

-1

Host name validation is used as an anti-bot measure. If you are running your CAPTCHA harvester on localhost via the URL the CAPTCHA needs to be harvested on, you can no longer directly send requests to the site from your computer as they will be redirected to localhost. However there are things you can do to get around this.

edited Aug 24 '18 at 11:11

S.L. Barth

5,486
8
38
47

answered Aug 24 '18 at 10:48

Tron

1

Why bother validating the hostname for a Google Recaptcha response?

3 Answers3