I am using Haproxy. I want to block scrapers from my website. In the haproxy.cfg , I have created a rule.
acl blockedagent hdr_sub(user-agent) -i -f /etc/haproxy/badbots.lst
http-request deny if blockedagent
The file /etc/haproxy/badbots.lst
contains the user-agent that I want to block,
^Lynx
^PHP
^Wget
^Nutch
^Java
^curl
^PEAR
^SEOstats
^Python\-urllib
^python\-requests
^HTTP_Request
^HTTP_Request2
As an example, it should block the wget
attempt too. But when I am using wget mysite.com/example/discussion
, it is giving me the output.
Also, I tried with python scrapy
too. But in both cases it is giving output, where it should block the attempts. I think block list is not working. What should be the recommended way to do this ?