Detecting Slashdot effect in nginx

Question

Is there a way I can make Nginx to notify me if hits from a referrer goes beyond a threshold?

e.g If my website is featured at Slashdot and all of sudden I have 2K hits coming in an hour I want to be notified when goes beyond 1K hits an hour.

Will it be possible to do this in Nginx? Possibly without lua? (since my prod is not lua compiled)

I did something like this to detect ddos on ngix. I achieved it by parsing access log. I did a cron job to parse access log and count unique ip connections per hour. — Hex, Sep 27 '12 at 16:57
You mean you want nginx to be able to detect if you've been bought by Dice? — MDMarra, Sep 27 '12 at 16:58
@Hex That (and maybe a few snippets from your script) would make an excellent answer to this question :) — voretaq7, Sep 27 '12 at 16:58
Probably no need to worry about getting Slashdotted anymore. Your webserver should be able to handle an extra 4 connections an hour. Might want to worry about getting Redditted, though... — HopelessN00b, Sep 27 '12 at 16:59
No need to to post parts of my script... Ladadadada did just that. Follow his explanation. My vote for him :) — Hex, Sep 27 '12 at 17:03

score 13 · Answer 1 · answered Sep 27 '12 at 16:58

13

I think this would be far better done with logtail and grep. Even if it's possible to do with lua inline, you don't want that overhead for every request and you especially don't want it when you have been Slashdotted.

Here's a 5-second version. Stick it in a script and put some more readable text around it and you're golden.

5 * * * * logtail -f /var/log/nginx/access_log -o /tmp/nginx-logtail.offset | grep -c "http://[^ ]slashdot.org"

Of course, that completely ignores reddit.com and facebook.com and all of the million other sites that could send you lots of traffic. Not to mention 100 different sites sending you 20 visitors each. You should probably just have a plain old traffic threshold that causes an email to be sent to you, regardless of referrer.

answered Sep 27 '12 at 16:58

Ladadadada

25,847
7
57
90

1

The problem is to be proactive. I need to know from any site. Another question is where do I put the threshold? Did you mean some additional log parsing? Also I didn’t find –o in http://www.fourmilab.ch/webtools/logtail/ – Quintin Par Sep 28 '12 at 04:24
The threshold will depend on how much traffic your server(s) can handle. Only you can set that. If you want quicker notification, run it every five minutes instead of every hour and divide the threshold by 12. The `-o` [option is for an offset file](http://linux.die.net/man/8/logtail) so it knows where to start reading next time. – Ladadadada Sep 28 '12 at 07:20
@Ladadadada, I disagree that the overhead would be substantial, see my solution — https://serverfault.com/a/870537/110020 — I believe the overhead would be quite minimal if this is implemented properly, especially, (1), if your backend is really slow, then this overhead would be negligible, or, (2), if your backend is already quite snippy and/or cached properly, then you should have little issues with traffic handling in the first place, and a little extra load won't make a dent. Overall, it sounds like this question has two use cases, (1), just being informed, and, (2), automatic scaling. – cnst Aug 26 '17 at 18:27

score 4 · Answer 2 · answered Sep 27 '12 at 17:41

The nginx limit_req_zone directive can base its zones on any variable, including $http_referrer.

http {
    limit_req_zone  $http_referrer  zone=one:10m   rate=1r/s;

    ...

    server {

        ...

        location /search/ {
            limit_req   zone=one  burst=5;
        }

You will also want to do something to limit the amount of state required on the web server though, as the referrer headers can be quite long and varied and you may see an infinte variet. You can use the nginx split_clients feature to set a variable for all requests that is based on the hash of the referrer header. The example below uses only 10 buckes, but you could do it with 1000 just as easily. So if you got slashdotted, people whose referrer happened to hash into the same bucket as the slashdot URL would get blocked too, but you could limit that to 0.1% of visitors by using 1000 buckets in split_clients.

It would look something like this (totally untested, but directionally correct):

http {

split_clients $http_referrer $refhash {
               10%               x01;
               10%               x02;
               10%               x03;
               10%               x04;
               10%               x05;
               10%               x06;
               10%               x07;
               10%               x08;
               10%               x09;
               *                 x10;
               }

limit_req_zone  $refhash  zone=one:10m   rate=1r/s;

...

server {

    ...

    location /search/ {
        limit_req   zone=one  burst=5;
    }

This is an interesting approach; however, I believe the question is about an automatic alert when the Slashdot effect is taking place; your solution seems to resolve around randomly blocking some 10% of users. Moreover, I believe your reasoning for using [`split_clients`](http://nginx.org/r/split_clients) may be misinformed — [`limit_req`](http://nginx.org/r/limit_req) is based on a "leaky bucket", which means that the overall state should never exceed the size of the specified zone. — cnst, Aug 26 '17 at 18:18

score 3 · Accepted Answer · answered Aug 31 '17 at 01:37

The most efficient solution might be to write a daemon that would tail -f the access.log, and keep track of the $http_referer field.

However, a quick and dirty solution would be to add an extra access_log file, to log only the $http_referer variable with a custom log_format, and to automatically rotate the log every X minutes.

This can be accomplished with the help of standard logrotate scripts, which might need to do graceful restarts of nginx in order to have the files reopened (e.g., the standard procedure, take a look at /a/15183322 on SO for a simple time-based script)…
Or, by using variables within access_log, possibly by getting the minute specification out of $time_iso8601 with the help of the map or an if directive (depending on where you'd like to put your access_log).

So, with the above, you may have 6 log files, each covering a period of 10 minutes, http_referer.Txx{0,1,2,3,4,5}x.log, e.g., by getting the first digit of the minute to differentiate each file.

Now, all you have to do is have a simple shell script that could run every 10 minutes, cat all of the above files together, pipe it to sort, pipe it to uniq -c, to sort -rn, to head -16, and you have a list of the 16 most common Referer variations — free to decide if any combinations of numbers and fields exceeds your criteria, and perform a notification.

Subsequently, after a single successful notification, you could remove all of these 6 files, and, in subsequent runs, not issue any notification UNLESS all six of the files are present (and/or a certain other number as you see fit).

This looks super useful. I might be asking for too much but like the earlier answer, would you mind helping with a script? — Quintin Par, Sep 01 '17 at 01:54
@QuintinPar That does sound extra-curriculum! ;-) If you want, I'm available for hire and consulting; my email is cnst++@FreeBSD.org, also at http://Constantine.SU/ — cnst, Sep 01 '17 at 02:17
Totally understand. Thanks much for all the help till now. Hope I can afford you some day :-) — Quintin Par, Sep 01 '17 at 16:07
@QuintinPar you're welcome! No worries, it should be a pretty simple script with the above spec; just a matter of testing, configuring and packaging, basically. :) — cnst, Sep 01 '17 at 17:36
@QuintinPar, BTW, in shell, you could also simply use [`cut -f11 -d" "`](http://mdoc.su/o/cut.1) to get the `referer` field from the `combined` `access_log`, and, also, before that, use `grep` by the date and hour, in case you don't want to keep separate `referer` logs around, and don't have all that many entries in your logs. P.S. Thanks for upvotes, accept and the bounty! :) — cnst, Sep 02 '17 at 02:40

score 2 · Answer 4 · answered Aug 26 '17 at 00:38

Yes, of course it is possible in NGINX!

What you could do is implement the following DFA:

Implement rate limiting, based on $http_referer, possibly using some regex through a map to normalise the values. When the limit is exceeded, an internal error page is raised, which you can catch through an error_page handler as per a related question, going to a new internal location as an internal redirect (not visible to the client).
In the above location for exceeded limits, you perform an alert request, letting external logic perform the notification; this request is subsequently cached, ensuring you will only get 1 unique request per a given time window.
Catch the HTTP Status code of the prior request (by returning a status code ≥ 300 and using proxy_intercept_errors on, or, alternatively, use the not-built-by-default auth_request or add_after_body to make a "free" subrequest), and complete the original request as if the prior step wasn't involved. Note that we need to enable recursive error_page handling for this to work.

Here's my PoC and an MVP, also at https://github.com/cnst/StackOverflow.cnst.nginx.conf/blob/master/sf.432636.detecting-slashdot-effect-in-nginx.conf:

limit_req_zone $http_referer zone=slash:10m rate=1r/m;  # XXX: how many req/minute?
server {
    listen 2636;
    location / {
        limit_req zone=slash nodelay;
        #limit_req_status 429;  #nginx 1.3.15
        #error_page 429 = @dot;
        error_page 503 = @dot;
        proxy_pass http://localhost:2635;
        # an outright `return 200` has a higher precedence over the limit
    }
    recursive_error_pages on;
    location @dot {
        proxy_pass http://127.0.0.1:2637/?ref=$http_referer;
        # if you don't have `resolver`, no URI modification is allowed:
        #proxy_pass http://localhost:2637;
        proxy_intercept_errors on;
        error_page 429 = @slash;
    }
    location @slash {
        # XXX: placeholder for your content:
        return 200 "$uri: we're too fast!\n";
    }
}
server {
    listen 2635;
    # XXX: placeholder for your content:
    return 200 "$uri: going steady\n";
}
proxy_cache_path /tmp/nginx/slashdotted inactive=1h
        max_size=64m keys_zone=slashdotted:10m;
server {
    # we need to flip the 200 status into the one >=300, so that
    # we can then catch it through proxy_intercept_errors above
    listen 2637;
    error_page 429 @/.;
    return 429;
    location @/. {
        proxy_cache slashdotted;
        proxy_cache_valid 200 60s;  # XXX: how often to get notifications?
        proxy_pass http://localhost:2638;
    }
}
server {
    # IRL this would be an actual script, or
    # a proxy_pass redirect to an HTTP to SMS or SMTP gateway
    listen 2638;
    return 200 authorities_alerted\n;
}

Note that this works as expected:

% sh -c 'rm /tmp/slashdotted.nginx/*; mkdir /tmp/slashdotted.nginx; nginx -s reload; for i in 1 2 3; do curl -H "Referer: test" localhost:2636; sleep 2; done; tail /var/log/nginx/access.log'
/: going steady
/: we're too fast!
/: we're too fast!

127.0.0.1 - - [26/Aug/2017:02:05:49 +0200] "GET / HTTP/1.1" 200 16 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:49 +0200] "GET / HTTP/1.0" 200 16 "test" "curl/7.26.0"

127.0.0.1 - - [26/Aug/2017:02:05:51 +0200] "GET / HTTP/1.1" 200 19 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:51 +0200] "GET /?ref=test HTTP/1.0" 200 20 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:51 +0200] "GET /?ref=test HTTP/1.0" 429 20 "test" "curl/7.26.0"

127.0.0.1 - - [26/Aug/2017:02:05:53 +0200] "GET / HTTP/1.1" 200 19 "test" "curl/7.26.0"
127.0.0.1 - - [26/Aug/2017:02:05:53 +0200] "GET /?ref=test HTTP/1.0" 429 20 "test" "curl/7.26.0"
%

You can see that the first request results in one front-end and one backend hit, as expected (I had to add a dummy backend to the location that has limit_req, because a return 200 would take precedence over the limits, a real backend isn't necessary for the rest of the handling).

The second request is above the limit, so, we send the alert (getting 200), and cache it, returning 429 (this is necessary due to the aforementioned limitation that requests below 300 cannot be caught), which is subsequently caught by the front-end, which is free now free to do whatever it wants.

The third request is still exceeding the limit, but we've already sent the alert, so, no new alert gets sent.

Done! Don't forget to fork it on GitHub!

Can two rate limiting conditions work together? I am using this right now: https://serverfault.com/a/869793/26763 — Quintin Par, Aug 27 '17 at 02:51
@QuintinPar :-) I think it'll depend on how you use it — the obvious problem would be to distinguish in a single location of which limit introduced the condition; but if this one is a [`limit_req`](http://nginx.org/r/limit_req), and the other one is a [`limit_conn`](http://nginx.org/r/limit_conn), then just use the `limit_req_status 429` above (requires very new nginx), and I think you should be golden; there may be other options (one to work for sure is chaining nginx w/ `set_real_ip_from`, but, depending on what exactly you want to do, there may be more efficient choices). — cnst, Aug 27 '17 at 03:14
@QuintinPar if there's anything that's missing from my answer, let me know. BTW, note that once the limit is reached, and your script is to be called, until such script is properly cached by nginx, then your content may be delayed; e.g., you might want to implement the script asynchronously with something like `golang`, or look into the timeout options for upstreams; also, might want to use `proxy_cache_lock on` as well, and possibly add some error handling for what to do if the script fails (e.g., using `error_page` as well as `proxy_intercept_errors` again). I trust my POC is good start. :) — cnst, Aug 28 '17 at 17:18
Thank you for attempting this. One major issue for me still is, I am using limit_req and limit_conn already at a http level and it applies to all the websites I have. I cannot override it will this. So this solution is using a functionality meant for something else. Anything other approach to this solution? — Quintin Par, Aug 28 '17 at 23:20
@QuintinPar What about having nested nginx instances, where each one will use a single set of `limit_req` / `limit_conn`? E.g., just put the above config in front of your current front-end server. You could use [`set_real_ip_from`](http://nginx.org/r/set_real_ip_from) in upstream nginx to ensure IPs are accounted correctly down the line. Else, if it still doesn't fit, I think you have to articulate your exact constraints and the spec more vividly -- what traffic levels are we talking about? How often does the stat need to run (1min/5min/1h)? What's wrong with the old `logtail` solution? — cnst, Aug 29 '17 at 06:18
@QuintinPar, also, if you're already using `limit_req` et al, are you sure old and new limits actually have to be different? It sounds like you might as well benefit from combining the two, possibly even disabling interactive features and enabling a solid caching policy for requests from Slashdot et al. — cnst, Aug 29 '17 at 06:22

Detecting Slashdot effect in nginx

4 Answers4