0

I have a client who has a hosting arrangement with 400 customer sites all hosted through SuPHP in CGI mode on Apache. The sysop is now gone and the client is calling on me for rolling out a new PHP thing. Trouble is -- server load is very high right now and we have found that it's due to the crawlers. We had one customer in particular who complained of slow websites, and we engaged a 304 header plugin in his site against most crawlers, and his site perked right up.

We'd like to lower that load by issuing a global 304 header to all the crawlers, letting human visitors through. I have a long list of user agent keywords to trap for.

What's the best way to temporarily engage that global 304 header, while allowing human visitors to get right on through?

I mean, I could roll out 400 .htaccess file changes, but it would be ideal to make this change in like one central Apache config and then it automatically affect all the sites at once.

Latest:

I think I see in some docs that I can scoop up some useragents like so:

RewriteCond %{HTTP_USER_AGENT} ^(google|spider|crawl|bot|yahoo) [NC]

But then how do I mate those user agents with a 304 header? I mean, is this the syntax?

RewriteCond %{HTTP_USER_AGENT} ^(google|spider|crawl|bot|yahoo) [NC]
Header set 304 "HTTP/1.0 304 Not Modified"
ServerChecker
  • 1,498
  • 2
  • 14
  • 32

1 Answers1

0

Hard to provide specific comment without details, but I assume you are using RewriteRules in the .htaccess file to accomplish this behavior. The mod_rewrite module can be activated and declarations like RewriteCond and RewriteRule can be used in the Apache configuration.

One good way to do this if you are doing mass virtual hosting could be to make a simple mixin that contains your rules and use Apache's Include functionality to include that configuration snippet in each vhost for which crawlers are posing a problem. Alternately, there are Apache modules that can slow or block individual IPs that are eating up too many resources. Examples include mod_throttle, mod_evasive, and the external utility fail2ban.

agperson
  • 61
  • 2
  • What if I tried this? RewriteCond %{HTTP_USER_AGENT} ^(google|spider|crawl|bot|yahoo) [NC] Header set 304 "HTTP/1.0 304 Not Modified" – ServerChecker Apr 25 '10 at 05:28