2

Google's indexing bot (edit: yes, it's Google, IP resolves) seems to be adding arbitrary query strings to our home page.

xx.xxx.xx.xxx - - [30/Jun/2009:10:14:37 -0400] "GET /?key=61680 HTTP/1.1" 200 3334 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
xx.xxx.xx.xxx - - [30/Jun/2009:10:16:58 -0400] "GET /?term=byron HTTP/1.1" 200 3184 "-" "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

Any idea what these are meant for?

ceejayoz
  • 32,469
  • 7
  • 81
  • 105
  • 1
    If you "whois" the ip you can check if the requests comes from google or not... – rkthkr Jun 30 '09 at 14:47
  • I've already confirmed that the IP resolves to Google. – ceejayoz Jun 30 '09 at 15:38
  • use "whois". The owner of the IP space can make it resolve to google.com – hayalci Jul 01 '09 at 00:10
  • OK, I'll try wording this a different way: I have confirmed that Google owns the IP in question. Move on, please. – ceejayoz Jul 02 '09 at 13:45
  • Why is the IP address of "GoogleBot" a secret? I think if you posted it here we'd be able to give you a much more accurate answer. – KPWINC Jul 03 '09 at 05:23
  • 1
    The IPs are irrelevant to the question, as I've confirmed that they are indeed Google's IPs. I'd have posted them uncensored if I'd thought about it at the time (I generally anonymise IPs before posting online), but I'm not about to go back into the logs to find the specific IPs for no good reason. Again, **please move on**. The IPs are Google's. – ceejayoz Jul 05 '09 at 00:22

3 Answers3

4

Looks like Googlebot may be lightly probing your site in search of possible content-duplication issues. Or to see if your site correctly handles non-existent files (by returning a 404 response header) and/or bogus query strings.

It may also be testing to see if you may be some kind of link farm if bogus query requests deliver some kind of differing result.

It's also possible that someone out there has linked to your site using those query string parameters and the Googlebot is just coming back to you to see what it's all about. If that's the case, try and find out who's linking to you in such a way and see if you can't get them to correct their links.

random
  • 450
  • 1
  • 9
  • 16
1

Are they found along with other Googlebot entries? If not it could be Googlebot is checking links from another website to yours to verify the connection with their algorithms. This means another website has links to your website with those URLs. I don't know if spam or link domains can do something with those URLs or not.

As I don't necessarily understand everything Googlebot does, I could be wrong, of course.

Joshua Nurczyk
  • 738
  • 6
  • 17
  • I can't find any external links to these pages via Google's link: syntax. Seems to be part of Google's normal crawl of the site. – ceejayoz Jun 30 '09 at 20:04
1

For the past few days Googlebot has been doing the same thing to one of our sites. It appears to be inserting a querystring value that matches a key we use, but expects an integer where Googlebot is supplying a string. (e.g. The parameter should be something like gb=22 but Googlebot is looking for gb=lkcvvzxxz)

What's worse, Googlebot is indexing these bad URLs into Google.

I would love to see this question answered. I know this should have been a comment, but don't have the points to do that on severfault yet...

shawnr
  • 111
  • 2