I have two domain names pointing to the same virtual server. One of them, http://ilarikaila.com, is a working brochure website I made for a friend. I used the other one, http://teemuleisti.com, to test-drive the site before making it public – in retrospect, probably a bad idea.

For a long time, Google-bot was confused in two ways about about a search for "ilari kaila", but while I was writing this, the second problem seems to have disappeared (added on edit: no, it hasn't).

Confusion the first

The Google search results for "ilari kaila" do include ilarikaila.com, but only on the third page of the results, and instead of a snippet from the site, the result includes the text "A description for this result is not available because of this site's robots.txt – learn more.".

The contents of the robots.txt file were simply

User-agent: *
Allow: /

which certainly should not prevent any bot from listing the site's contents. Indeed, when the serch terms "ilari kaila" were fed into bing.com, the site came up as the first search result (and stlil does), and a correct snippet was and is shown.

A couple of days ago, I removed robots.txt altogether (or rather, renamed it not_robots.txt), but Google is still showing the same result, referring to robots.txt. (This is probably the reason that the site only appears on the third page of the search results.)

Confusion the second

Originally, requests to teemuleisti.com showed the same pages as ilarikaila.com, because I had not written a separate server block for the former in my nginx.conf file. I did that a couple of weeks ago, and wrote one very simple HTML page for the former site.

Nevertheless, the Google search results for "ilari kaila" showed links to site teemuleisti.com even about two weeks after I did the preceding, and up to an hour ago. However, this problem seems to have been resolved (added on edit: no, it hasn't) while I was writing this question, perhaps because I just added the following redirect to the server's nginx.conf file:

server {
    listen              80;
    server_name         teemuleisti.com www.teemuleisti.com;
    location = /index.html {
    location ~* ^/(.+)$ {
        rewrite ^ http://teemuleisti.com redirect;

to redirect search results such as http://teemuleisti.com/press (which showed a snippet of content that is actually at http://ilarikaila.com/press) to the only page of teemuleisti.com, which now informs visitors of the problem with Google's indexing, and has a link to the correct site.

This seems to have set Google-bot right on that problem – though I can't see what difference it made, as there have been no subpages under teemuleisti.com for weeks – but what's with the confusion about robots.txt?

Added on edit: If I google for "ilari kaila composer", the second page of the search results still points to teemuleisti.com, so this problem is not yet resolved, either.

Teemu Leisti
  • 123
  • 7
  • 1
    If you intend to allow everything, you do not need a `robots.txt` file at all. – Michael Hampton May 15 '14 at 11:52
  • Yes, that's what I'm now trying: I deleted the file formerly named robots.txt completely. Although, as it had been renamed not_robots.txt for a few days, I wonder why Googlebot hadn't already reacted by indexing the site properly. – Teemu Leisti May 15 '14 at 14:45

1 Answers1


There is no such things as Allow in robots.txt, thus your robots.txt is invalid. This gets the bots confused as it only expects Agent and Disallow descriptions. You should remove robots.txt or have it empty instead if you want all context to be indexed.

More info: http://www.robotstxt.org/robotstxt.html

  • 2,073
  • 4
  • 18
  • 23
  • OK, thanks. I simply copied an erroneous robots.txt I found somewhere. However, I no longer have a robots.txt file, but Google's results still refer to the file. Perhaps Google-bot is too smart for its own good, and looks for files whose names are **almost** like robots.txt? Just in case it does, I deleted not_robots.txt; we'll see what happens. – Teemu Leisti May 15 '14 at 08:53
  • The changes you make to robots.txt will only appear on google once it reindexes the site, so you should not expect anything change same time you update your site. It sometimes takes even some time after site is re-indexed for new results to appear. – phoops May 15 '14 at 08:55
  • Right, but the Google search results referred to robots.txt even a week after I renamed robots.txt to not_robots.txt, and still do as I write this. – Teemu Leisti May 15 '14 at 08:56
  • 2
    Google understands `Allow` https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt – Alexey Ten May 15 '14 at 09:14