5

Trying to get the following behavior working in nginx

A default rate limit of 1r/s for each ip when using a browser. A rate limit of 10r/s for bing and google spiders. Reject bad bots.

Unfortunately google doesn't publish ip addresses for googlebot so I'm limited to useragent.

So far this gets close:

http { 
  # Rate limits
  map $http_user_agent $uatype {
    default 'user';
    ~*(google|bing|msnbot) 'okbot';
    ~*(slurp|nastybot) 'badbot';
  }

  limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
  limit_req_zone $binary_remote_addr zone=two:10m rate=10r/s;

  ...

  server {
    ...

    location / {
      if ($uatype == 'badbot) {
        return 403;
      }

      limit_req zone=one burst=5 nodelay;
      if ($uatype != 'user') {
        limit_req zone=two burst=10 nodelay;
      }

      ...
    }

  ...
  }
}

BUT - 'if' isn't allowed to do this.

$ nginx -t

nginx: [emerg] "limit_req" directive is not allowed here in /etc/nginx/nginx.conf nginx: configuration file /etc/nginx/nginx.conf test failed

There are so many untested suggestions on nginx forums, most do not even pass configtest.

One that looks promising is Nginx Rate Limiting by Referrer? -- Downside of that version is that all of the configuration is repeated for each different limit (I have many rewrite rules)

Anyone got something good?

Ali W
  • 277
  • 1
  • 4
  • 7

2 Answers2

3

Today I was able to implement rate limiting on a user agent base; try this:

map $http_user_agent $bad_bot {
    default 0;
    (foo|bar) 1;
}

map $http_user_agent $nice_bot {
    default "";
    (baz|qux) 1;
}

limit_req_zone $nice_bot zone=one:10m rate=1r/s;
limit_req_status 429;

server {
    ...
    location / {
        limit_req zone=one nodelay;
        if ($badbot) {
            return 403;
        }
        ...
    }
}
hvelarde
  • 133
  • 6
  • Does this limit all IPs that use the same user agent, or will it limit each IP that uses the user agent separately? – jerclarke May 17 '19 at 00:33
  • We are doing something similar, but not getting anything close to our "rate" across many IPs that are DDOSing us. – jerclarke May 17 '19 at 00:33
2

Unfortunately you can't dynamize this way, limit request module doesn't support this.

The link you found is probably the only way to achieve this. Use include directive to "avoid" repeating your configuration.

But what if a thirdparty crawler suddenly impersonate a goodbot user agent ?

Xavier Lucas
  • 12,815
  • 2
  • 44
  • 50
  • > But what if a thirdparty crawler suddenly impersonate a goodbot user agent ? -- totally agree - terrible that google doesn't publish a ip list. Thanks for the answer. – Ali W Oct 25 '14 at 23:24
  • @AliW Interestingly, [tengine](http://tengine.taobao.org/document/http_limit_req.html) which is the chinese fork of nginx developed by alibaba seems to have better support for these cases. – Xavier Lucas Oct 25 '14 at 23:50
  • In my error.log I see a line: 2022/02/08 10:00:55 [error] 25833#25833: *150390 limiting requests, excess: 20.855 by zone "mylimit", client: 34.91.221.177, server: _, request: [MORE HERE] How can I add the user-agent in the erorr.log print? – Nathan B Feb 08 '22 at 11:18