Questions tagged [robots.txt]

Convention to prevent webcrawlers from indexing your website.

If a site owner wishes to give instructions to web robots they must place a text file called robots.txt in the root of the web site hierarchy (e.g. www.example.com/robots.txt). This text file should contain the instructions in a specific format (see examples below). Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the web site. If this file doesn't exist web robots assume that the web owner wishes to provide no specific instructions.

A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data. Links to pages listed in robots.txt can still appear in search results if they are linked to from a page that is crawled.

For websites with multiple subdomains, each subdomain must have its own robots.txt file. If example.com had a robots.txt file but a.example.com did not, the rules that would apply for example.com would not apply to a.example.com.

Source: wikipedia

85 questions

votes

1 answer

Is there a chance to block images spiders / bots on dedicated servers without using robots.txt or .htaccess?

We know that we can block certain spiders from crawling websites pages using robots.txt or .htaccess or maybe via the Apache configuration File httpd.conf. But that requires to edit may be a large number of sites on some dedicated servers and bots…

asked Apr 30 '13 at 01:22

hsobhy

votes

1 answer

robots.txt for subdomain iis7

I have two different sites in iis7 both point to the same folder they have different subdomains www.sitename.com foo.sitename.com they are essentially the same website, but it runs different logic depending on the subdomain. i want www.sitename.com…

iis-7 robots.txt

asked Apr 10 '13 at 20:35

Crudler

votes

1 answer

If I redirect all users (Except me) from within .htaccess, do I need a robots.txt file

So.. I have my live version of my site e.g v1.0 at domain.com I then have my development testing version at testing.domain.com I want testing.domain.com to only be accessible to me for testing, and as such I redirect all other IPs in my .htaccess…

.htaccess redirect 301-redirect robots.txt

asked Jan 13 '13 at 17:27

Thomas Clowes

votes

1 answer

webcrawler bots load test my website and it fails the test

we run a commerical website with a relavitely few number of customer at any one time ~30 users . Frequently a webcrawler such as google bot, bing bot, and 80legs will bring our site to a grinding halt. Altering robots.txt does not have an immediate…

tomcat robots.txt

asked Jun 13 '12 at 15:24

NimChimpsky

votes

2 answers

How can I use varnish to generate a robots.txt file even for subdomain of the same site?

I want to generate a robots.txt file using Varnish 2.1. That means that domain.com/robots.txt is served using Varnish and also subdomain.domain.com/robots.txt is also served using Varnish. The robots.txt must be hardcoded into default.vcl file. is…

varnish robots.txt

asked Mar 19 '12 at 05:20

Sam

votes

1 answer

Disallow xml robots.txt

Google webmaster FAQs suggest that this will exclude all xml files from search: User-agent: Googlebot Disallow: /*.xml$ Is this legal for other bots as well? User-agent: * Disallow: /*.xml$

robots.txt

asked Mar 15 '11 at 15:19

Ben K.

2,149
4
17
15

votes

1 answer

Cross-submission robots.txt for multiple domains on single host

We are running a site with multiple languages hosted in a single environment on IIS7. For example, oursite.com - english oursite.de - german oursite.es - spanish This is a single-host environment. All of these sites are in the same application…

iis-7 robots.txt sitemap

asked Jan 20 '11 at 21:46

sidd.darko

votes

2 answers

301 redirect or disallow on robots.txt?

I recently asked for 301 redirection on ServerFault and I didn't get a proper solution to my problem, but now I have a new idea: use the robots.txt to disallow certain URLs from my site to be "crawled". My problem was simple: after a migration from…

wordpress google seo 301-redirect robots.txt

asked Dec 10 '10 at 11:49

javipas

1,292
3
23
38

votes

2 answers

Is this a valid robots.txt file?

I have this robots.txt file: User-agent: * Sitemap: Path_to_sitemap.xml My Q is, should I have something else in there as well? Like allow All or something? Thanks

search seo robots.txt

asked Sep 30 '10 at 07:05

Anonymous12345

1,012
1
12
17

votes

2 answers

Quick Robots.txt question

Will the following robots.txt syntax correctly block all pages on the site that end in "_.php"? I don't want to accidentally block other pages. User-Agent: * Disallow: /*_.php Also, am I allowed to have both "Allow: /" and "Disallow:" commands…

robots.txt

asked Jul 23 '10 at 22:03

bccarlso

votes

1 answer

seems to block /my-beautiful-sef-url-123

I have robots.txt that looks like this: User-agent: * Disallow: /system/ Disallow: /admin/ Disallow: /index.php The obvious goal has been to prevent all the ugly URLs from being indexed, as they all begin with "/index.php". But for some reason all…

seo robots.txt

asked Jun 07 '10 at 23:40

Jaroslav Záruba

votes

0 answers

nginx configuration for robots.txt

I've read other answers and Nginx docs, and I can't figure out why this works: location = /robots.txt { alias //static/robots.allow.txt; } and this don't: location = /robots.txt { rewrite .* /robots.allow.txt last; } for the…

nginx robots.txt

asked May 10 '22 at 20:51

Nestor

votes

0 answers

Make webserver to prevent parsing of certain HTML elements

MediaWiki content management system creates many links which their webpages I want not to be discovered by search engine crawlers. It's not only that I don't want them indexed and more so not only that I don't want them crawled, but I don't even…

php javascript mediawiki robots.txt css

asked Mar 04 '22 at 18:05

technology-liker

votes

0 answers

Cannot block YandexBot with mod_rewrite

We have an Apache httpd 2.4 server as our point of entry for about 20 web sites and each site has its own virtualhost configuration. A lot of settings are probably redundant but it suits our needs. Each virtualhost redirects http traffic to an https…

apache-2.4 mod-rewrite robots.txt

asked Oct 27 '21 at 16:01

user900203

votes

0 answers

With Nginx / Node.js reverse proxy how does Nginx serve robots.txt despite txt files not being referenced in Nginx config's location blocks?

In Chrome when I enter https://www.example.com/robots.txt my robots.txt file is served and works fine. I'm happy that it works but I'm not sure why it does. In the config below I thought that my last location block, location / was a catch-all that…

nginx robots.txt

asked Mar 15 '21 at 23:07

stackedAndOverflowed

Prev 1 2 3 4

6 Next