1

We're currently being crawled at a greater rate than we can handle.

I can't seem to get nginx blocking the googlebot

server {
    location /ajax/sse.php {
        if ($http_user_agent ~* "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ) {
            return 403;
        }
    }

}

We've had to resort to blocking it in the php script -

if ($_SERVER['HTTP_USER_AGENT'] == 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)') {
  header('HTTP/1.0 403 Forbidden');
  exit();
}

What's wrong with my nginx config?

Aidan Ewen
  • 271
  • 1
  • 4
  • 11

1 Answers1

2

Why not just use robots.txt ? -> https://support.google.com/webmasters/answer/6062596

In my nginx logs googlebot user agent is just googlebot/2.1 or "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

Try this

if ($http_user_agent ~ (googlebot) ) {
       return 403;
   }

or

if ($http_user_agent ~* (google) ) {
       return 403;
   }
Skamasle
  • 412
  • 2
  • 10
  • 1
    Thanks @Skamasle. We can't use robots.txt because it's happening now and robots.txt is cached by the bot for up to a day. I've tried the nginx config with `googlebot` and I can't use the config with `google` because we need the googleadsbot to have access. – Aidan Ewen Mar 23 '17 at 20:24