Questions tagged [googlebot]

30 questions
8
votes
3 answers

AH01797: client denied by server configuration: /usr/share/doc

Since quite a while (over a month now) I see lines like the following in the apache logs: 180.76.15.138 - - [24/Jun/2015:16:13:34 -0400] "GET /manual/de/mod/module-dict.html HTTP/1.1" 403 396 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0;…
matpen
  • 387
  • 2
  • 4
  • 10
3
votes
3 answers

fail2ban ignoreip DNS host example?

I would like to add ".googlebot.com" to the ignore iplist for fail2ban since the ignoreip explanation mentions DNS host as an accepted input. Is this a proper format? # "ignoreip" can be an IP address, a CIDR mask or a DNS host. Fail2ban will not #…
giorgio79
  • 1,747
  • 9
  • 25
  • 36
3
votes
1 answer

Why is googlebot requesting robots.txt from my SSH server?

I run ossec on my server and periodically I receive a warning like this: Received From: myserver->/var/log/auth.log Rule: 5701 fired (level 8) -> "Possible attack on the ssh server (or version gathering)." Portion of the log(s): Nov 19 14:26:33…
Brian
  • 766
  • 1
  • 6
  • 14
2
votes
1 answer

What's with random-character queries coming from googlebot, e.g., vvytnoxvontwusz.html?

One of my sites has been getting queries from googlebot, on the order of: example-log:66.249.79.216 - - [06/Apr/2016:15:36:56 -0700] "GET /vvytnoxvontwusz.html HTTP/1.1" 404 15136 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;…
Jim Miller
  • 713
  • 2
  • 11
  • 23
2
votes
1 answer

Google-Bot fell in love with my 404-page

Every day my access-log looks kind of this: 66.249.78.140 - - [21/Oct/2013:14:37:00 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.78.140 - - [21/Oct/2013:14:37:01…
32bitfloat
  • 253
  • 2
  • 3
  • 9
2
votes
1 answer

Nginx Googlebot rewrite rules failing with 404

Our site is based on Angular which makes it almost completely JavaScript based, therefore we need to serve static HTML snapshots to the Googlebot in order for it to crawl us. At the moment, we have this implementation in place: location / { #…
Tyler Alex
  • 21
  • 1
2
votes
1 answer

Apache : with Googlebot connections, a single process takes all server memory

Following https://serverfault.com/questions/418735/unbelievable-issue-a-single-apache-process-takes-4-gb-of-memory I post as a new question because I was able to identify the fact that it happens when the client connecting is Googlebot. By "it", I…
db_ch
  • 638
  • 5
  • 14
  • 20
2
votes
1 answer

How can a nameserver block Google bot?

Background: Our domain page.et is not accessible by Google's mobile-friendly checking tool and search console. The same seems to be true for all other .et domains I tested. The reason is not the robots.txt. Google bot does not even try to make a…
Alex
  • 476
  • 13
  • 35
1
vote
1 answer

Block googlebot on a specific page using nginx

We're currently being crawled at a greater rate than we can handle. I can't seem to get nginx blocking the googlebot server { location /ajax/sse.php { if ($http_user_agent ~* "Mozilla/5.0 (compatible; Googlebot/2.1;…
Aidan Ewen
  • 271
  • 1
  • 4
  • 11
1
vote
0 answers

Enabling TLS/SSL with SNI on a subset of websites, without losing SEO ranking on the non-TLS sites

We run a number of LAMP servers on AWS with a few dozen websites on them, that customers pay us to design, build and host. They're Ubuntu 14.04 servers with Varnish, Apache and PHP. Currently, if a customer wanted to have SSL/TLS for their website,…
Martijn Heemels
  • 7,438
  • 6
  • 39
  • 62
1
vote
1 answer

How to prevent Google Favicon bot to call to my site?

I have some backend url that I use for myself in google chrome only. It's not open public. However for some reason, this bot "Google Favicon" ip located at Google call this URL which I do not want. My guess is Google get this URL from my Google…
1
vote
1 answer

Allow Google To Bypass Firewall Nginx

So I am looking for a system which essentially returns a 401 for every visitor that doesn't have a certain cookie. I would like to make it so if the visitor/requester is google then it does not return the 401. So here is the following code that I…
1
vote
2 answers

block fake google bots

How could I block DDOS attacks with fake Google bots? I found 2 solutions on the net. But both seems to block also correct google bots. # Block fake google when it's not coming from their IP range's (A fake googlebot) [F] => Failure RewriteCond…
1
vote
2 answers

Googlebot requesting pages of 1 site on another site

Problem: Using Prerender.io to index/store pages of one site, I keep getting path requests that only exist on my old site Example: on Prerender I'll see that Googlebot requested http://www.new-site.com/old/site/path I have an old website…
Maruf
  • 159
  • 9
1
vote
1 answer

Trouble filtering googlebot from apache access log

Though it seems like it should be pretty straightforward, I have been unable to configure apache so that googlebot's requests are not stored in the access log. I've tried the following lines: SetEnvIfNoCase User-Agent googlebot…
1
2