looking at my Apache access.log I see that crawlers tend to get old versions of pages and documents, like: - - [10/Jun/2011:10:36:31 +0200] "GET /wiki/News?version=14 HTTP/1.1" 200 6073 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

I'd like them not to append the ?version=x suffix to URLs, so that they only get the most recent contents.

Is there a way to do this via the robots.txt file (or other mechanisms that I don't know?

1 Answers1


If you are using trac out of the box then these pages have both NOINDEX and NOFOLLOW on them so much as they will get crawled they won't be indexed.

Matthew Steeples
  • 1,303
  • 1
  • 10
  • 17