8

If I want my main website to on search engines, but none of the subdomains to be, should I just put the "disallow all" robots.txt in the directories of the subdomains? If I do, will my main domain still be crawlable?

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
tkbx
  • 201
  • 1
  • 2
  • 6

3 Answers3

9

The robots.txt file needs to go in the top level directory of you webserver. If your main domain and each subdomain are on different vhosts then you can put it in the top level directory of each subdomain and include something like

User-agent: *
Disallow: /

Where the robots.txt is located depends upon how you access a particular site. Given a URL like

 http://example.com/somewhere/index.html

a crawler will discard everything to the right of the domain name and append robots.txt

http://example.com/robots.txt  

So you need to put your robots.txt in the directory pointed to by the DocumentRoot directive for example.com and to disallow access to /somewhere you need

User-agent: *
Disallow: /somewhere

If you have subdomains and you access them as

http://subdomain.example.com

and you want to disallow access to the whole subdomain then you need to put your robots.txt in the directory pointed to by the DocumentRoot directive for the subdomain etc.

user9517
  • 114,104
  • 20
  • 206
  • 289
  • Would this work? `User-agent: *; Allow: *; Disallow: /subdomains`? – tkbx Aug 31 '12 at 19:35
  • 1
    If you access your subdomains as http://example.com/subdomains/subdomain1 etc then you shouldn't need the allow as everything not excluded is allowed by default. – user9517 Aug 31 '12 at 19:45
  • OK, so within the server, I have my root files and /Subdomains with their own index.html's. I'm not sure how common this is, but on the service I use (1&1), an actual subdomain (sub.domain.com) can be linked to a folder. I can have sub.domain.com link to /Subdomains/SomeSite (and /Subdomains/SomeSite/index.html from there). Will disallowing /Subdomains work in this case? – tkbx Aug 31 '12 at 19:51
  • It's all about how you access your main domain and it's subdomains. Take a look at http://www.robotstxt.org/. – user9517 Aug 31 '12 at 19:54
2

You have to put it in your root directory, otherwise it won't be found.

David
  • 461
  • 1
  • 7
  • 22
2
  1. You need to put robots.txt in you root directory

  2. The Disallow rules are not domian/sub-domain specific and will apply to all urls

For example: Lets assume you are using a sub.mydomain.com and mydomain.com (both are linked to the same ftp folder). For this setup, if you set a Disallow: /admin/ rule then all URL sub.mydomain.com/admin/ and in mydomain.com/admin/ will be Disallowed.

But if sub.mydomain.com is actually links no another site (and also to another ftp folder) then you`ll need to create another robots.txt and put it in the root of that folder.

Igal Zeifman
  • 121
  • 2