7

I have several sites in a /24 network that all get crawled by google on a pretty regular basis. Normally this is fine. However, when google starts crawling all the sites at the same time, the small set of servers that back this IP block can take a pretty big hit on load.

With google webmaster tools, you can rate limit the googlebot on a given domain, but I haven't found a way to limit the bot across an IP network yet. Anyone have experience with this? How did you fix it?

masegaloeh
  • 17,978
  • 9
  • 56
  • 104
Zak
  • 1,032
  • 2
  • 15
  • 25

4 Answers4

3

I found these notes interesting to pursue

  1. Get yourself a smart robots.txt and other robots.txt posts there
  2. A post on Google's Dirty Little Secret by someone troubled with google bots
  3. Google web crawlers
nik
  • 7,040
  • 2
  • 24
  • 30
2

You can go to google and create an account with the webmaster tool and then you can control the crawl rate for each site. Go to Site Configuration::Settings::Crawl Rate. This won't let you schedule your sites in a certain order I don't believe, but you can at least slow it down for all of them.

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
1

If you run BGP you could simply rate-limit AS15169 (AS-GOOGLE), but doing it by hand is likely to be far too error-prone.

LapTop006
  • 6,466
  • 19
  • 26
-3

No, not roable. You ahve to put that into a robots.txt on every site. Google- rightly - does not have toold for "IP address ownsers" so to say. All control comes from the robots.txt on the websites.

TomTom
  • 50,857
  • 7
  • 52
  • 134