Questions tagged [googlebot]
30 questions
8
votes
3 answers
AH01797: client denied by server configuration: /usr/share/doc
Since quite a while (over a month now) I see lines like the following in the apache logs:
180.76.15.138 - - [24/Jun/2015:16:13:34 -0400] "GET /manual/de/mod/module-dict.html HTTP/1.1" 403 396 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0;…
matpen
- 387
- 2
- 4
- 10
3
votes
3 answers
fail2ban ignoreip DNS host example?
I would like to add ".googlebot.com" to the ignore iplist for fail2ban since the ignoreip explanation mentions DNS host as an accepted input. Is this a proper format?
# "ignoreip" can be an IP address, a CIDR mask or a DNS host. Fail2ban will not
#…
giorgio79
- 1,747
- 9
- 25
- 36
3
votes
1 answer
Why is googlebot requesting robots.txt from my SSH server?
I run ossec on my server and periodically I receive a warning like this:
Received From: myserver->/var/log/auth.log
Rule: 5701 fired (level 8) -> "Possible attack on the ssh server (or version gathering)."
Portion of the log(s):
Nov 19 14:26:33…
Brian
- 766
- 1
- 6
- 14
2
votes
1 answer
What's with random-character queries coming from googlebot, e.g., vvytnoxvontwusz.html?
One of my sites has been getting queries from googlebot, on the order of:
example-log:66.249.79.216 - - [06/Apr/2016:15:36:56 -0700] "GET /vvytnoxvontwusz.html HTTP/1.1" 404 15136 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;…
Jim Miller
- 713
- 2
- 11
- 23
2
votes
1 answer
Google-Bot fell in love with my 404-page
Every day my access-log looks kind of this:
66.249.78.140 - - [21/Oct/2013:14:37:00 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.78.140 - - [21/Oct/2013:14:37:01…
32bitfloat
- 253
- 2
- 3
- 9
2
votes
1 answer
Nginx Googlebot rewrite rules failing with 404
Our site is based on Angular which makes it almost completely JavaScript based, therefore we need to serve static HTML snapshots to the Googlebot in order for it to crawl us. At the moment, we have this implementation in place:
location / {
#…
Tyler Alex
- 21
- 1
2
votes
1 answer
Apache : with Googlebot connections, a single process takes all server memory
Following https://serverfault.com/questions/418735/unbelievable-issue-a-single-apache-process-takes-4-gb-of-memory I post as a new question because I was able to identify the fact that it happens when the client connecting is Googlebot.
By "it", I…
db_ch
- 638
- 5
- 14
- 20
2
votes
1 answer
How can a nameserver block Google bot?
Background: Our domain page.et is not accessible by Google's mobile-friendly checking tool and search console. The same seems to be true for all other .et domains I tested.
The reason is not the robots.txt. Google bot does not even try to make a…
Alex
- 476
- 13
- 35
1
vote
1 answer
Block googlebot on a specific page using nginx
We're currently being crawled at a greater rate than we can handle.
I can't seem to get nginx blocking the googlebot
server {
location /ajax/sse.php {
if ($http_user_agent ~* "Mozilla/5.0 (compatible; Googlebot/2.1;…
Aidan Ewen
- 271
- 1
- 4
- 11
1
vote
0 answers
Enabling TLS/SSL with SNI on a subset of websites, without losing SEO ranking on the non-TLS sites
We run a number of LAMP servers on AWS with a few dozen websites on them, that customers pay us to design, build and host. They're Ubuntu 14.04 servers with Varnish, Apache and PHP.
Currently, if a customer wanted to have SSL/TLS for their website,…
Martijn Heemels
- 7,438
- 6
- 39
- 62
1
vote
1 answer
How to prevent Google Favicon bot to call to my site?
I have some backend url that I use for myself in google chrome only. It's not open public. However for some reason, this bot "Google Favicon" ip located at Google call this URL which I do not want. My guess is Google get this URL from my Google…
Paiboon Panusbordee
- 167
- 1
- 9
1
vote
1 answer
Allow Google To Bypass Firewall Nginx
So I am looking for a system which essentially returns a 401 for every visitor that doesn't have a certain cookie. I would like to make it so if the visitor/requester is google then it does not return the 401.
So here is the following code that I…
Eddie Chrisman
- 11
- 2
1
vote
2 answers
block fake google bots
How could I block DDOS attacks with fake Google bots?
I found 2 solutions on the net. But both seems to block also correct google bots.
# Block fake google when it's not coming from their IP range's (A fake googlebot) [F] => Failure
RewriteCond…
Matthias Jaekle
- 111
- 2
1
vote
2 answers
Googlebot requesting pages of 1 site on another site
Problem: Using Prerender.io to index/store pages of one site, I keep getting path requests that only exist on my old site
Example: on Prerender I'll see that Googlebot requested http://www.new-site.com/old/site/path
I have an old website…
Maruf
- 159
- 9
1
vote
1 answer
Trouble filtering googlebot from apache access log
Though it seems like it should be pretty straightforward, I have been unable to configure apache so that googlebot's requests are not stored in the access log. I've tried the following lines:
SetEnvIfNoCase User-Agent googlebot…
Jonathan Basile
- 123
- 5