Lessen impact of DoS attack on cpu-expensive login?

Question

This is a follow-up to my previous question: Prevent DOS against RSA authentication. This question is also discusses a similar problem: Prevent denial of service attacks against slow hashing functions?.

My setup is client-server with plain sockets and a custom protocol. The client programs are shipped a public RSA key, and the servers hold the private key counterpart.

In general, connections are persistent so logins are actually rare, except during server restarts, network glitches and other issues that might cause a large amount of clients to disconnect and reconnect. In other words a "DDoS" attack might simply be clients trying to reconnect...

The current (DoS-friendly) handshake works as follows:

Client sends [protocol-version] [public-key encrypted nonce].

Server unpacks nonce.

Server generates its own nonce and uses PBKDF2 to derive a key using the client nonce and its own nonce.

The server responds with [reply-code] [server-nonce] [AES encrypted packet + HMAC]

Client checks the reply-code. If ok, takes the server-nonce and derives the key and checks that everything is ok by decrypting the encrypted payload using the new key.

Encrypted communication commences using this newly created key.

The problem is that a large amount of connects will consume a lot of CPU resources. This will both affect already logged in users, but it will also slow down logins, causing logins to timeout, which in turn cause even more disconnects and logins to occur.

It's easy to see that even without malicious intent, the servers are not stable under reasonably high login load if the decryption is costly.

In order to mitigate this, I posed the question mentioned above: Prevent DOS against RSA authentication, which suggested ECDH to lower the cost of logins.

That is a good start, but it might not properly address the problem, where a DoS will not only prevent logins, but also degrade the experience of users already logged in.

I've tried to come up with a few strategies that could help regardless of the handshake algorithm used, and I'd like to hear which ones would be recommended, and if I'm overlooking some useful strategy.

Restrict login CPU usage by queuing decrypts on a single (or few) low priority thread. If the queue is large, clients can be rejected or kept on hold with periodic updates.
Add a login server, which serves clients with keys, similar to the handshake above, but the client is also given an identifier with this key. The client can then log into the normal servers by presenting the identifier, as the server will be able to retrieve it. (Presenting the identifier and retrieving the key from the login server would replace 1-3 in the handshake protocol). Any DoS would only affect the login server.
Again a login server, but using HTTPS instead of the custom RSA scheme for distributing the key + identifier to the client.
Login server as (2) but require the client to present a Hashcash with the request (and the login server does not process the encrypted data unless it is valid)
Login server as (2) but use a server-issued client puzzle instead.

Merits and disadvantages (as I understand them):

(1) Can be used both with the normal servers and for a login server.

Using (2) will complicate login somewhat, but makes it trivial to ensure that players aren't affected by a DoS attack on the authentication algorithm.

I suspect that (3) would make it easier to use with a DDoS system like Cloudflare, however it is my understanding that (4) and (5) is impossible to use with HTTPS, which is a downside.

Regardless, any scheme needs to be coupled with the standard mechanisms preventing single machine DoS attacks, such as banning quickly reconnecting IPs. Selecting a cheaper authentication algorithm will also help a lot.

EDIT

To Summarize

My current handshake cannot handle a sufficiently large amount of simultaneous connects because the RSA decryption will consume excessive amounts CPU.
I would like to know the usual methods to reduce this vulnerability, both at handshake level (cheaper algorithms, client puzzles, limited CPU for decryption etc) and on server level (separate out login services etc). Links to papers / books would be great.
Also, I would be grateful if I could get a good/bad assessment of the strategies (1-5) mentioned above.

EDIT 2

That these are persistent connections running a custom protocol. This means thousands of legitimate clients may be connected at the same time.

If an attacker succeeds in temporarily choke the bandwidth and cause connection timeouts, this can be used to leverage legitimate client reconnects to bring down the server, regardless of client reconnect delays.

@НЛО Impractical due to the added complexity. As explained, this is plain socket communication. Adding a protocol negotiating if the client is allowed encrypted communications doesn't just make the handshake very complex, but the number of states in the client to present the captcha is daunting, especially considering that the client will also connect to two different servers in sequence. And again, this doesn't protect from the scenario of a rapid reconnect due to network or server glitches. — Nuoji, Apr 30 '13 at 14:09

score 3 · Accepted Answer · edited Jul 29 '20 at 16:36

Even without the threat of DoS, unrestricted authentication attempts is a problem in its own right. A common way of restricting authentication attempts at the protocol level is to force a given IP into waiting exponentially longer after each failed authentication attempt.

A cool feature of SSL/TLS is that it allows clients to resume already established sessions, which reduces resource consumption.

(as AJ Henderson pointed out) Forcing the attacker to know a username before consuming CPU on a PBKDF2 hash is a neat solution, as long as usernames are hard to guess. Based on the timing of requests, or more commonly an error message produced, this implementation could be a username enumeration venerability.

Filtering authentication attempts with mod_dosblock, Sourcefire or another IPS/Application Firewall is cutting off the foot to save the body. This technology isn't magic, it sees that there are a large number of one type of request (a login request) and filters them all. In this context, this approach would filter legitimate authentication attempts. This means that the attacker has won using less of the attacker's resources than it would take to consume all available CPU time.

The general form of this login based DoS is an Algorithmic Complexity Attacks or ACA. The root problem of an ACA is that the attacker is able to force the victim into performing calculations at a higher complexity class than the server. When designing a protocol you can force the potential attacker into submitting a Proof-of-Work with every authentication attempt. A commonly used proof-of-work for authentication systems is a captcha! This type of work is very difficult for a computer to solve, which significantly limits automated attempts. Another proof of work could be forcing the client into solving a PBKDF2 of a nonce for every authentication attempt. The resulting hash is the proof-of-work, and can be submitted along with the username/password.

Price Via Processing Or Combating Junkmail

Just so it's clear. This is not a web server, it's plain socket with a custom protocol. The server itself will have connections to a large amount of clients (1000-10000). Even with a proof-of-work, these legitimate clients can cause problems. A particular weakness would be if the attempted attack simply causes lots of disconnects to clients. The legitimate client reconnects might be sufficient to bring down the server, even if all the attack succeeds in is disconnecting a % of all clients. This problem is very different from non-persistent connections. — Nuoji, Apr 30 '13 at 18:12
@Nuoji SSL supports the resuming of connections so why not this protocol? Disconnecting a large number of clients would be a venerability. Sending TCP-FIN packets would require the attacker to intercept a client's traffic... Also, who says you can't have a CAPTCHA over a non-http protocol? — rook, May 01 '13 at 23:36
it would be sufficient for an attacker to hog enough bandwidth so that client ping packets would be delayed. That would cause the server to boot them off the server. However, resuming connections fixes that issue. Will you add that to the answer and I'll accept it. — Nuoji, May 02 '13 at 07:04

score 2 · Answer 2 · answered Apr 29 '13 at 13:29

2

Check the username (or session identifier) first. If you don't have one in your handshake that is easily accessible, add one. Looking up an identifier is cheap. As long as you don't give away your identifiers, they won't have a large number of valid identifiers to throw at your login or hash function. If an identifier is being used repeatedly, lock it out (or terminate the session if it is a session identifier).

answered Apr 29 '13 at 13:29

AJ Henderson

41,816
5
63
110

I don't really understand at what stage you suggest I do this. – Nuoji Apr 29 '13 at 16:14
@nuoji Preferably with step 1, client sends. Send an identifier for the connection with that step and don't continue processing if the identifier isn't valid. If an identifier is abused, invalidate it and require a full login to get a new identifier. If it is the login itself, then lock their account since it is being abused and use another channel to get a new account identifier to them. – AJ Henderson Apr 29 '13 at 16:36
How is an identifier different from just checking and blocking IPs that are flooding with requests? – Nuoji Apr 30 '13 at 08:09
@Nuoji - a distributed attack could come from more than one IP address with the same ID. You need some type of cheap check prior to doing any expensive operations. It can happen any time before an expensive operation, but you want some cheap token to be provided by an attempt to login prior to doing hard work server side. This way you can have a reliable filter against DoS and DDoS. – AJ Henderson Apr 30 '13 at 13:15
I don't see this helping any against legitimate rapid reconnects. – Nuoji Apr 30 '13 at 14:51
1

@Nuoji - if it is legitimate rapid reconnects, then it isn't a security issue, it's a load issue and not security related. DoS specifically means false traffic, not legitimate traffic causing load. Designing for handling legit load also isn't really a security question. – AJ Henderson Apr 30 '13 at 15:27
The underlying problem DDoS or rapid reconnects, is that the handshake immediately incur a great CPU cost. Looking at the DDoS case, it's about how many attacks that need to get through in order to destabilize the system. This number is the same number of rapid reconnects that the server can sustain. Reducing the number of DDoS attackers is good, but reducing the sensitivity to rapid logins is as important. The reason this is a security question is because it's the security handshake that leaves the server open. I'm looking for alternate methods that do not exhibit the same weakness. – Nuoji Apr 30 '13 at 18:04

score 1 · Answer 3 · answered Apr 30 '13 at 08:04

1

You want to stop a possible DOS attack against the server, so why not put in an application layer aware firewall or a network IDS/IPS better still do both.

A Palo Alto firewall with the correct application layer rules would be able to identify the authentic traffic and forward it to the server while a Snort based IDS/IPS could be easily configured to drop D/DOS style packets.

A snort based IDS/IPS (or if you have the budget a full Sourcefire one) will have the rules built in for this type of protection.

There is also the proviso that if they really want to and can afford to pay for the zombie host's there is no protection from DDOS and sometimes you just have to weather the storm as best you can and make sure that the DDOS only leads to the lose of service and is not the cover for a more advanced attack that is after your companies IP.

answered Apr 30 '13 at 08:04

Gawainuk

316
1
4

+1 [YMMV](http://www.urbandictionary.com/define.php?term=YMMV) with [WAFs](http://security.stackexchange.com/a/18456/20074) tho. ;) I like the latter part of your answer better, maybe mention load-distribution techniques such as round robin DNS, GeoDNS,... – TildalWave Apr 30 '13 at 08:16
1

Given the current handshake protocol, there's no problem for a machine to send legitimate packets and in that way paralyse the server. So the issue is primarily about massive reconnect working in-effect as a short duration DDoS attack, with hardening the server against malicious, sustained DDoS only as a secondary consideration. – Nuoji Apr 30 '13 at 14:18
1

So if you block the login with an IPS, then no one can login. This causes a much more serious problem than it attempts to solve. – rook Apr 30 '13 at 15:15

Lessen impact of DoS attack on cpu-expensive login?

To Summarize

3 Answers3