1

I know can create ONE robots.txt file for all domains on an Apache server*, but I want to append to each domain's (if pre-existing) robots.txt. I want some general rules in place for all domains, but I need to allow different domains to have their unique rules.

Is there a way to accomplish this?

(*In my case Apache 2.2.x)

Gaia
  • 1,777
  • 4
  • 32
  • 58

2 Answers2

5

From Apache's point of view, robots.txt is just an asset to be served. You can alter the content returned when robots.txt is requested by passing it through an output filter.

If you want to append some text, you could define an external filter. Assuming that Apache is running on Unix-like operating system, the filter configuration could be

ExtFilterDefine appendRobotstxt cmd="/bin/cat - /var/www/html/robots-tail.txt"
<Location /robots.txt>
    SetOutputFilter appendRobotstxt
</Location>

That would concatenate robots-tail.txt to the end of the response.

200_success
  • 4,701
  • 1
  • 24
  • 42
  • Apache is on CentOS 6.3. I will give it a try. Thanks. – Gaia Nov 03 '12 at 14:39
  • Do you know how should I define intype so that it only filters robots.txt ? – Gaia Nov 03 '12 at 15:54
  • @Gaia The filter is activated not by the intype of the filter definition, but by a SetOutputFilter directive. SetOutputFilter appendRobotsTail – 200_success Nov 04 '12 at 09:39
  • `ExtFilterDefine appendRobotstxt cmd="cat - /root/robots-tail.txt" SetOutputFilter appendRobotstxt ` inside httpd.conf gets me a 500 for robots.txt. I gave /root/robots-tail.txt 777 for testing purposes, and apache runs suexec/fcgid. – Gaia Nov 05 '12 at 20:10
  • SELinux is disabled. placing the file in public_html in one of the vhosts allows for an error to be displayed: `No such file or directory: couldn't create child process to run cat` followed by `No such file or directory: can't initialise output filter appendrobotstxt: aborting`. Server is running suExec. – Gaia Nov 06 '12 at 01:48
  • the command needs to begin with `/bin/cat `... thanks! – Gaia Nov 06 '12 at 01:57
  • @Gaia I've incorporated your comments into my answer. – 200_success Nov 06 '12 at 03:22
1

Note that you'd probably have to incorporate the changes. If a domain already has

User-agent: *
Disallow: /search

and you want to add for all domains

User-agent: *
Disallow: /admin/

you'd have to make it

User-agent: *
Disallow: /search
Disallow: /admin/

because robots.txt parsers stop as soon as they found a block that matches them.

unor
  • 246
  • 2
  • 19