0

We have an Apache httpd 2.4 server as our point of entry for about 20 web sites and each site has its own virtualhost configuration. A lot of settings are probably redundant but it suits our needs. Each virtualhost redirects http traffic to an https login page which proxied through a Tomcat app. There is no physical docroot to this web site.

We are trying to block a YandexBot that is hitting our main site each second or so because of the redirection that generates a unique URL for each hit. Since blocking the agent within the .htaccess is not an option here, we tried to block the YandexBot with mod_rewrite as you can see below. Unfortunately this is not working, our website is still being hit by the bot with an HTTP code 302 that redirects to a new URL. Any inputs would be appreciated.

<VirtualHost *:443>
  ServerName mywebsite.com
  ServerAlias *.mywebsite.com
  ...

  RewriteEngine On
  RewriteCond %{HTTP_USER_AGENT} Yandex [NC]
  RewriteRule ^ - [F,L]
    
  RewriteRule ^/$ https://%{SERVER_NAME}/app/... [R,L]

  ...
  <Location /app/>
    Require all granted
    RequestHeader unset Origin
    ProxyPreserveHost On
    ProxyPass http://127.0.0.1:8080/app/...
    ...
  </Location>
  ...
</VirtualHost>

This is the log that is displayed in the access_log file

<IP address> - - [27/Oct/2021:11:55:49 -0400] "GET /app/...:<unique id>: HTTP/1.1" 302 - 69963 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

0 Answers0