4

I'm running Apache 2 and after analysing the access_log I discovered that my website is visited more than 800 times a day (today 924) from majestic.co.uk bot. The bot use this range of IP addresses:

46.4.123.172
220.241.45.142
94.22.46.23
88.198.16.153
178.137.88.101
91.194.84.106
144.76.8.132
46.4.120.3
176.9.10.227
208.107.12.128
46.4.89.35
91.230.202.131
62.210.90.118
62.16.252.183
46.4.32.75
46.4.116.197
198.27.66.194
199.58.86.206
46.165.197.142
195.154.187.115
144.76.7.107
91.121.221.15
51.254.97.22
195.154.156.209
98.218.34.60
195.154.157.47
198.27.82.146
178.202.133.84
91.179.245.81


From the range I have seen that the IP continue to change, a first thought was for me (correct if I'm wrong) to setup a rule like this on the example of the first IP address:

route add -net 46.4.123.0/24 gw 127.0.0.1 lo


This way I would block 46.4.123.1 to 46.4.123.255, but I would not work for me since every time change the IP so I need to track down and block each one.

My question is, there is a way instead of null-route the request to redirect the traffic to a domain like blocked.xxx.com where the user see "Hey you IP is blocked you cannot visit the website please if you think this is an error contact me".

Vilican
  • 2,703
  • 8
  • 21
  • 35
  • 4
    Have you looked at [fail2ban](http://www.fail2ban.org/wiki/index.php/Main_Page)? – Neil Smithline Sep 13 '15 at 17:05
  • 3
    Have you tried to simply disallow the bot to crawl your pages using robots.txt? A well-behaved bot will follow the rules and according to [their FAQ](http://www.majestic12.co.uk/projects/dsearch/mj12bot.php) they actually do. Also, why to you think the traffic is malicious? 800 visits a day should not really be a problem for even the slowest site. – Steffen Ullrich Sep 13 '15 at 17:28
  • Since this is a distributed crawler run by volunteers you will see lots of different source IP address and if you try to block these or even the full net you will probably block innocent users. A better way would be to either block by user-agent (in the web server) or simply use robots.txt because this should be a well behaved crawler. – Steffen Ullrich Sep 13 '15 at 19:08

3 Answers3

9

The Majestic project is a distributed web crawler, which explains why you get such a lot of different source IP addresses. It is not malicious, that is it does not attack your site and it does not even uses lots of resources (800 requests a day is not much).

Like most proper bots Majestic even includes a URL in the user-agent string and if you visit this URL you will find topics like "How can I block MJ12bot?". This topic explains how you can block this bot in a sane way and I recommend you follow the advice there. Note that this bot is not something special, it follows the same rules as most of the the others innocent crawlers like the bots from google, yahoo, bing etc.

Apart from that, your idea to block based on the source IP of the request is not only useless in this case but actually harmful because it will exclude innocent users from your web site. These requests for this distributed crawler are done from the computers of volunteers like home users. You could see this if you would reverse lookup some of the IP addresses, e.g. 91.179.245.81 resolves too 81.245-179-91.adsl-dyn.isp.belgacom.be.. Thus if you would block the whole /24 network for this IP address you would exclude lots of users from this ISP.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
3

If I understand your question correctly you wish to redirect certain IP-adresses to a different domain? If that is the case you can use this in your .htaccess file:

RewriteEngine On
RewriteBase /
RewriteCond %{REMOTE_HOST} 46\.4\.123\.1
RewriteCond %{REQUEST_URI} /index\.html$
RewriteRule .* /blocked.html [R=301,L]

change the RewriteCond and RewriteRule, or in IPtables using:

"# sysctl net.ipv4.ip_forward=1"
then
"# iptables -t nat -A PREROUTING -s 46.4.123.0/24 -p tcp --dport 80 -j DNAT --to-destination 1.1.1.1:80"

change -s the first IP to the ones you want to redirect and the 1.1.1.1 to where you want it redirected

Zardox
  • 51
  • 6
  • 1
    While this may be technically correct, @SteffenUllrich's [answer](http://security.stackexchange.com/a/100186/10885) explains that blocking these addresses is really not the correct approach. – Neil Smithline Sep 14 '15 at 01:29
  • @neil-smithline I agree with that, the bot is in no way malicious. – Zardox Sep 14 '15 at 15:51
0

There is no point blocking bot's IP addresses as Majestic is a distributed search indexing project and anyone can opt-in to setup a bot on their computer. By blocking Majestic IP addresses, you effectively block normal user access to your website.

How to block Majestic bot

Use robots.txt file to disallow bots access to your website. This is far more recommended way and has been answered already here.

How to redirect Majestic bot

Use Apache mod_rewrite to redirect the bot to another location by looking up its User-Agent HTTP request header. Create a .htaccess file in your website root directory with the following content. Make sure your have mod_rewrite enabled on your virtual host.

  RewriteEngine On
  RewriteBase /    
  RewriteCond %{HTTP_USER_AGENT} ^MJ12bot
  RewriteRule ^(.*)$ http://blocked.xxx.com [R=301,L]

Note: I haven't tested the above entries.

In both above-mentioned methods, you may still see bots access entries in your Apache logs every now and then unless the bot is programmed to completely ignore your website forever.

Dr. mattle
  • 300
  • 1
  • 10
  • Thank you very much to everybody answering my question, I have learned new things. I agree with @SteffenUllrich if I use null-route or iptables I could block dinamic IP of innocent users. 800 visits a day is not a problem but those eat the bandwidth and server resource and I think majestic is useless instead of Google or Bing/Yahoo. I will test today all your examples on VM and share my opinions. – Marcos Lamba Sep 14 '15 at 08:06