3

We experience a lot of traffic and server load on a web server.

All I can find out is majestic12 accessing pages all the time.

I wonder how I can prevent majestic12 from indexing the site

Do they respect any robots.txt entry and how do I write such an enty?

user12096
  • 917
  • 5
  • 23
  • 39

2 Answers2

6

According to Majestic 12's own page about robots.txt, they fully respect robots exclusion (3rd answer from the top). The robots.txt file is a plain text file in the root of your website, i.e. you place it at:

http://www.yourdomain.com/robots.txt

and have these lines in the file:

User-agent: MJ12bot
Disallow: /

So if you want to block that bot, I see no problem -- unless you're getting hammered by one of the fake bots they mention.

  • Thank you. Somehow I didn't find the faq on their pages but now i have it. I wonder if they have mentioned all fake bots. Why would someone claim to be majestic12? – user12096 May 16 '10 at 09:30
3

For the op's follow up question:

I wonder if they have mentioned all fake bots. Why would someone claim to be majestic12?

That would be a false flag operation. Virus disguises itself as a legit bot/process to crawl ip's. The explanation is on the majestic12 faq way down.

deploymonkey
  • 588
  • 3
  • 11