12

With the nginx HttpLimitReq module requests can be limited by IP. However, I'm not understanding what the "nodelay" option does.

If the excess requests within the limit burst delay are not necessary, you should use the nodelay

limit_req   zone=one  burst=5  nodelay;
Xeoncross
  • 4,269
  • 12
  • 42
  • 55

5 Answers5

16

TL;DR: The nodelay option is useful if you want to impose a rate limit without constraining the allowed spacing between requests.

I had a hard time digesting the other answers, and then I discovered new documentation from Nginx with examples that answers this: https://www.nginx.com/blog/rate-limiting-nginx/

Here's the pertinent part. Given:

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

location /login/ {
  limit_req zone=mylimit burst=20;
  ...
}

The burst parameter defines how many requests a client can make in excess of the rate specified by the zone (with our sample mylimit zone, the rate limit is 10 requests per second, or 1 every 100 milliseconds). A request that arrives sooner than 100 milliseconds after the previous one is put in a queue, and here we are setting the queue size to 20.

That means if 21 requests arrive from a given IP address simultaneously, NGINX forwards the first one to the upstream server group immediately and puts the remaining 20 in the queue. It then forwards a queued request every 100 milliseconds, and returns 503 to the client only if an incoming request makes the number of queued requests go over 20.

If you add nodelay:

location /login/ {
  limit_req zone=mylimit burst=20 nodelay;
  ...
}

With the nodelay parameter, NGINX still allocates slots in the queue according to the burst parameter and imposes the configured rate limit, but not by spacing out the forwarding of queued requests. Instead, when a request arrives “too soon”, NGINX forwards it immediately as long as there is a slot available for it in the queue. It marks that slot as “taken” and does not free it for use by another request until the appropriate time has passed (in our example, after 100 milliseconds).

Mark Woon
  • 296
  • 2
  • 3
11

The documentation here has an explanation that sounds like what you want to know:

The directive specifies the zone (zone) and the maximum possible bursts of requests (burst). If the rate exceeds the demands outlined in the zone, the request is delayed, so that queries are processed at a given speed

From what I understand, requests over the burst will be delayed (take more time and wait until they can be served), with the nodelay options the delay is not used and excess requests are denied with a 503 error.

This blog post (archive.org) gives good explanation how the rate limiting works on nginx:

If you’re like me, you’re probably wondering what the heck burst really means. Here is the trick: replace the word ‘burst’ with ‘bucket’, and assume that every user is given a bucket with 5 tokens. Every time that they exceed the rate of 1 request per second, they have to pay a token. Once they’ve spent all of their tokens, they are given an HTTP 503 error message, which has essentially become the standard for ‘back off, man!’.

coredump
  • 12,573
  • 2
  • 34
  • 53
  • 5
    I think you're incorrect, the nginx manual states: "Excessive requests are delayed until their number exceeds the maximum burst size". Note that *until exceeds maximum burst* is entirely different meaning than *over the burst* that you said. You also conflated *burst* with *excess requests*, I believe *excess requests* means it's above the zone, while it may still be below the *maximum burst*. – Hendy Irawan Dec 19 '14 at 07:09
  • http://nginx.org/en/docs/http/ngx_http_limit_req_module.html - According to docs, nodelay has a purpose only when used along-side the burst - After the burst example, the docs say `If delaying of excessive requests while requests are being limited is not desired, the parameter nodelay should be used` – variable Feb 16 '22 at 04:57
6

The way I see it is as follows:

  1. Requests will be served as fast as possible until the zone rate is exceeded. The zone rate is "on average", so if your rate is 1r/s and burst 10 you can have 10 requests in 10 second window.

  2. After the zone rate is exceeded:

    a. Without nodelay, further requests up to burst will be delayed.

    b. With nodelay, further requests up to burst will be served as fast as possible.

  3. After the burst is exceeded, server will return error response until the burst window expires. e.g. for rate 1r/s and burst 10, client will need to wait up to 10 seconds for the next accepted request.

Hendy Irawan
  • 325
  • 3
  • 8
3

The setting defines whether requests will be delayed so that they conform to the desired rate or whether they will be simply rejected...somewhat whether the rate limiting is managed by the server or responsibility is passed to the client.

nodelay present

Requests will be handled as quickly as possible; any requests sent over the specified limit will be rejected with the code set as limit_req_status

nodelay absent (aka delayed)

Requests will be handled at a rate that conforms with the specified limit. So for example if a rate is set of 10 req/s then each request will be handled in >= .1 (1/rate) seconds, thereby not allowing the rate to be exceeded, but allowing the requests to get backed up. If enough requests back up to overflow the bucket (which would also be prevented by a concurrent connection limit), then they are rejected with the code set as limit_req_status.

The gory details are here: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_limit_req_module.c#L263 where that logic kicks in when the limit has not yet been passed and now the delay is optionally going to be applied to the request. The application of nodelay in particular from the directive comes into play here: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_limit_req_module.c#L495 causing the value of delay above to be 0 triggering that handler to immediately return NGX_DECLINED which passes the request to the next handler (rather than NGX_AGAIN which will effectively requeue it to be processed again).

Matt Whipple
  • 131
  • 3
1

I didn't understand that at the first time when I was reading the introduction from https://www.nginx.com/blog/rate-limiting-nginx/.

Now I am sure I understand and my answer is so far the best. :)

Suppose: 10r/s is set, the server's max capability is e.g. 10000r/s which is 10r/ms and there is only 1 client at the moment.

So here's the main difference between 10r/s per IP burst=40 nodelay and 10r/s per IP burst=40.

enter image description here

As the https://www.nginx.com/blog/rate-limiting-nginx/ documented (I strongly recommend reading the article first(except the Two-Stage Rate Limiting section)), this behaviour fixes one problem. Which one?:

In our example, the 20th packet in the queue waits 2 seconds to be forwarded, at which point a response to it might no longer be useful to the client.

Check the draft I made, the 40th request gets response at 1s while the other 40th gets response at 4s.

This can make the best use of the server's capability: sends back responses as quick as possible while still keeping the x r/s constraint to a given client/IP.

But there's also cost here. The cost will be:

If you have many clients queuing on the server let's say client A, B and C.

Without nodelay, the requests are served in an order similar to ABCABCABC.
With nodelay, the order is more likely to be AAABBBCCC.


I would like to sum up the article https://www.nginx.com/blog/rate-limiting-nginx/ here.

Above all, the most important configuration is x r/s.

  1. x r/s only, excess requests are rejected immediately.

  2. x r/s + burst, excess requests are queued.

1. vs 2., the cost is that on the client side, the queued requests take up the chances of later reuqests which will have had the chance of getting served.

For example, 10r/s burst=20 vs 10r/s, the 11th request is supposed to be rejected immediately under the latter condition, but now it is queued and will be served. The 11th request takes up the 21th request's chance.

  1. x r/s + burst + nodelay, already explained.

P.S. The Two-Stage Rate Limiting section of the article is very confusing. I don't understand but that doesn't seem to matter.

For example:

With this configuration in place, a client that makes a continuous stream of requests at 8 r/s experiences the following behavior.

8 r/s? seriously? There are 17 requests within 3 seconds shown in the image, 17 / 3 = 8?

Rick
  • 229
  • 1
  • 4
  • 14
  • If rate limiting is done via client IP address, then how come this statement is true: `Without nodelay, the requests are served in an order similar to ABCABCABC.`? It should be AAABBBCCC – variable Feb 16 '22 at 05:08