Current situation is that we are getting thousands and thousands of 404 errors from bots looking for robots.txt in different places on our site due to domain redirects.

Our old website was a labyrinthine multisite powered by dotnetnuke with multiple domain names. We have changed over to a single site on Wordpress with one domain name. The remaining domain names now just redirect to categories on the site. This has meant that googlebot, bingbot and many others repeatedly try to index the domains which used to be full-fledged sites and get redirected.

www.EXAMPLE.co.uk redirects to www.EXAMPLE.co.uk/challenge/

and so /challenge/robots.txt has over a thousand 404s

the same with other redirects which end up at /walktoschool/robots.txt etc etc

Is there a smart way to redirect bots? Or a different way that this should have been handled or get the bots to stop? Our new website doesn't even use robots.txt, it uses htaccess in conjunction with Better WP Security. I have put in requests with Google and Bing to re-crawl the new website but this has been the result.

I am an amateur webmaster at a non-profit organization and I've really had to hit the ground running, any help would be gratefully received!

  • 15
  • 4
  • What sort of redirect are you using? Specifically, are you using HTTP response code 301, 302, 303 or 307? (See [RFC 2616](https://www.ietf.org/rfc/rfc2616.txt) section 10.3 and subsections.) For the use case you are describing, only HTTP/1.1 `301 Moved Permanently` should be used; 302 (which is very often used for redirects), 303, 304 and 307 are explicitly termed as temporary redirects and UAs are explicitly prohibited from storing the redirected-to location; 300, 305 and 306 are not applicable. – user Feb 12 '14 at 12:06
  • 2
    It's a 302 redirect. Since he included the domain, I just checked it myself. A different problem I spotted is that `http://www.kmcharitychallenge.co.uk/robots.txt` gives a redirect to `http://www.kmcharityteam.co.uk/challenge//robots.txt`. A 301 redirect will cause the same problem. Add in an extra rule to redirect the `robots.txt` URLs from all domains to the root of the new domain. – Ladadadada Feb 12 '14 at 12:24
  • I have now created a robots.txt to the best of my ability and added this to .htaccess RewriteCond %{REQUEST_URI} !^/robots\.txt [NC] RewriteCond %{REQUEST_URI} robots\.txt [NC] RewriteRule (.*) http://www.kmcharityteam.co.uk/robots.txt [R=301,L] It seems to have done the trick, so fingers crossed this is all solved. Thanks everyone! – Beatchef Feb 12 '14 at 14:00

2 Answers2


When doing the sort of redirects you are doing, there is only one HTTP response code which is applicable, namely 301 Moved Permanently. RFC 2616, the standard that defines the HTTP protocol, defines the 301 response code thusly (my emphasis):

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.

The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).

If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

Contrast this to a HTTP 302 Found redirect, which is very often used when simply configuring a "redirection" and which is defined as (again, my emphasis):

The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.

The temporary URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).

If the 302 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

Hence, the proper way to do HTTP redirection in your scenario is to configure the web server to return a 301 response indicating the new location, rather than a 302 response. Capable clients will then store the new URL and use that for any future requests.

  • 4,267
  • 4
  • 32
  • 70
  • Thank you and Ladadadada for the answers :) Regarding the domain redirects, we use Heart Internet (like a UK GoDaddy) for managing the domains. Under "Manage Domain > Web Forwarding" there are only two options, Automatic Redirect and Framed Redirect and all you can do is set the url and nothing else. So I assume that they are setting the 302? – Beatchef Feb 12 '14 at 13:33
  • @Beatchef I have no idea. You'll have to ask them. – user Feb 12 '14 at 14:03

I think you'd be better off not redirecting requests for /robots.txt while still redirecting everything else. If the old site used to have a /robots.txt file, you should probably just keep it. Otherwise an empty file would do. But you could also decide it is time for a bit of cleanup and put /robots.txt files on the old domains, which disallow crawling of pages, which got deleted during or after the consolidation.

  • 29,894
  • 16
  • 72
  • 122