Block website scraper in Haproxy

Question

I am using Haproxy. I want to block scrapers from my website. In the haproxy.cfg , I have created a rule.

acl blockedagent hdr_sub(user-agent) -i -f /etc/haproxy/badbots.lst
    http-request deny if blockedagent

The file /etc/haproxy/badbots.lst contains the user-agent that I want to block,

^Lynx
^PHP
^Wget
^Nutch
^Java
^curl
^PEAR
^SEOstats
^Python\-urllib
^python\-requests
^HTTP_Request
^HTTP_Request2

As an example, it should block the wget attempt too. But when I am using wget mysite.com/example/discussion , it is giving me the output. Also, I tried with python scrapy too. But in both cases it is giving output, where it should block the attempts. I think block list is not working. What should be the recommended way to do this ?

nuster cache server · Accepted Answer · 2018-04-18T07:45:01.317

1

use hdr_reg

acl blockedagent hdr_reg(user-agent) -i -f /etc/haproxy/badbots.lst

Or remove ^ from badbots.lst

====

$ cat conf
global
    debug
defaults
    mode http
frontend web
    bind *:80
    acl blockedagent hdr_reg(user-agent) -i -f badbots.lst1
    http-request deny if blockedagent
    default_backend asdf

backend asdf
    server a 127.0.0.1:8000

$ cat badbots.lst1
^Wget
^curl



$ curl http://127.0.0.1
<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>

$ wget http://127.0.0.1
--2018-04-16 01:47:51--  http://127.0.0.1/
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2018-04-16 01:47:51 ERROR 403: Forbidden.

$ curl http://127.0.0.1 -A "asdf"
HELLO

edited Apr 18 '18 at 07:45

answered Apr 18 '18 at 00:54

nuster cache server

560
2
7

Not working. Same result. – Cyberzinga Apr 18 '18 at 07:03
@Cyberzinga how's your conf looks like? Are you using `tcp` mode? – nuster cache server Apr 18 '18 at 07:47
My problem was that I was doing it for port 80 only. This time, I have included in port 443 also, since I am using SSL – Cyberzinga Apr 18 '18 at 10:07

Block website scraper in Haproxy

1 Answers1