2

I have a load balancer using apache: http://httpd.apache.org/docs/2.2/mod/mod%5Fproxy%5Fbalancer.html

The problem is our bandwidth. We're trying to get more, but the ISP has to run new lines and keeps putting us off, so I'd like to throttle down the spiders to conserve bandwidth until we can get more. I tried mod cband, but it won't work on load balanced virtual hosts.

Is there any apache modules that can throttle traffic on load balanced virtual hosts?

dan
  • 323
  • 1
  • 5
  • 16

1 Answers1

3

Not an apache module, but you can try using the "Crawl delay" directive in robots.txt to slow down well behaved spiders.

http://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive

cagenut
  • 4,808
  • 2
  • 23
  • 27
  • We've done that, the problem is the ones that don't listen. – dan Oct 22 '09 at 01:30
  • 1
    If they don't obey robots.txt, then they don't get to index your site until your ISP gets you more bandwidth. Use either a firewall of your choice or Apache to deny access to those IPs. The list of IPs may be rather long, but you should be able to put enough blocks in to hold you over until you can re-allow them. – Kevin M Nov 19 '09 at 17:23