Prevent DOS against RSA authentication

Question

In my current setup, the clients are bundled with the server's public key. The client encrypts a nonce and sends it to the server which uses its own nonce and the nonce received from the client to set up a symmetric encryption key and sends back the unencrypted nonce + an encrypted response.

Unfortunately, this makes the server wide open to both deliberate and accidental DOS attacks.

As an example of the latter, let's say a server is taken down and the clients all try to reconnect. That will means 100-1000 (non-malicious) connects/s from different ips.

What are my options aside from reducing RSA key size?

Edit

What about any of these strategies:

Move the authentication to a separate server, which securely hands out encryption keys with an expiry date. When logging in to a server, the client presents an identifier that the server can use to retrieve the encryption key to use.
Put RSA decryption in an unbounded queue with sufficiently low thread priority. When a client logs in, it will either automatically get logged in (if there are sufficient resources), or put in a login queue. Until the decryption is scheduled, the client will get updated information about how close they are to get their login completed.
Put RSA decryption in a bounded queued queue with sufficiently low thread priority. If the queue is full, then the connection is refused with a message that the login queue limit is reached and the client can retry later.

You could use elliptic curve diffie hellman instead of RSA. That way a dedicated server should be able to do 5000+ keyexchanges per core and second. — CodesInChaos, Apr 26 '13 at 16:59

score 5 · Accepted Answer · answered Apr 26 '13 at 17:08

You could switch to a faster algorithm. RSA is fine, but Elliptic Curve Diffie-Hellman is faster. Let's take my current machine, a laptop with an AMD A8-4555M CPU (1.6 GHz, not a very fast processor):

$ openssl speed rsa2048 ecdhp256
Doing 2048 bit private rsa's for 10s: 1468 2048 bit private RSA's in 9.99s
Doing 2048 bit public rsa's for 10s: 47384 2048 bit public RSA's in 9.98s
Doing 256 bit  ecdh's for 10s: 5847 256-bit ECDH ops in 9.99s
OpenSSL 1.0.1c 10 May 2012
built on: Tue Mar 19 19:10:34 UTC 2013
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_NO_TLS1_2_CLIENT -DOPENSSL_MAX_TLS1_2_CIPHER_LENGTH=50 -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.006805s 0.000211s    146.9   4747.9
                              op      op/s
 256 bit ecdh (nistp256)   0.0017s    585.3

So 256-bit ECDH, arguably stronger (or at least as strong) as 2048-bit RSA, is also about four times faster. This performance is for one core; since that specific CPU is quadri-core, it could handle two thousands connections per second. A server processor would probably do substantially better.

Among other possible mitigation techniques, you can implement some client-side delays. In your "accidental DoS" scenario, all the client try to reconnect simultaneously. Possibly, you could make the clients wait a random time before reconnecting, when the connection is lost (instead of connecting immediately, make them wait a random time between 0 and 60 seconds, then try again). This will help spread the load.

You could also buy more servers. Hardware is cheap.

With RSA I can issue the clients the public key, and they can trust that whoever manages to decrypt the message is someone who owns the private key. Can ECDH work in a similar manner? Even though speeding up validation is good, it's not really eliminating the underlying weakness. Will you have a look at the strategies I added in my edit? — Nuoji, Apr 27 '13 at 09:43
These ECDH numbers are still pretty low. For example a portable c implementation of Curve25519 can do [4.5k exchanges per second per core](https://github.com/nightcracker/ed25519) on a slightly faster computer. — CodesInChaos, Apr 27 '13 at 11:38
There's something definitely fishy with OpenSSL's implementation. The same code runs at over 1000 ECDH (with P-256) per second, on an old, cheap, 1.6 GHz AMD Athlon 2650e, which should be _slower_, not _twice faster_, than the more recent A8 (SHA-1 benchmark are more coherent: 240 MB/s for the 2650e, 340 MB/s for the A8). OpenSSL's EC code was contributed from Sun, and was supposedly audited carefully to be uncovered by existing patents, which means that OpenSSL developers are reluctant to modify it in any way. — Tom Leek, Apr 27 '13 at 15:57
And now with an even older 1.6 GHz Turion 64 processor from 2005 (whereas the 2650e is from 2008 and the A8 is from 2012), I get a whooping 1695 ECDH/P-256 per second, as opposed to 318 RSA-2048 per second. There is something wrong in modern AMD processors, or in OpenSSL, or both. — Tom Leek, Apr 28 '13 at 00:49
- and another annoying thing is that ECC encryption isn't as widely supported as RSA. For the iOS client this means adding the entire OpenSSL lib in order to get hold of it. — Nuoji, Apr 28 '13 at 15:04
On a 2.7 GHz i7, I get much better values: 757 for RSA 2048, 2652 for ECDH/P-256. Still on one core. — Tom Leek, Apr 29 '13 at 01:26

score -1 · Answer 2 · edited Apr 07 '20 at 21:44

Write a manual C++ script to login using IP address as part of the authentication database session info and do testing to just be able to handle billions of connections.

To speed up the process have a PreProcess ClientSend->InitTrashPacketForAntiGenServer 0001, then the server sends back ServerSend->InitSocketPacket1->0001, then do your regular authentication connection logic, it will also require some disconnect and deliberate disconnect logic in client/server, so the server can avoid wasting CPU (a possible offense against this is to decompile the client and make the DOS client send 0001).

score -2 · Answer 3 · answered Apr 26 '13 at 16:49

-2

On the accidental (those you control) side, you could have your clients perform a lessor resource extensive check (like just a ping) of the server first before starting the protocol related procedure and burning more resources. But you will never be able to totally stop some check by your clients - they have to be able to do something to start the connection process.

On the non-accidental, all systems have to deal with DOS and you deal with them the same way. You have your layered detectors, your redundant systems/nets (so you can rollover), and you monitor like you never imagined.

answered Apr 26 '13 at 16:49

Tek Tengu

1,699
11
13

What would the ping add? What's the difference in sending `ping`->`connect(rsa-encrypted payload)` and just sending `connect(rsa-encrypted-payload)`? Seems to me that they would be virtually equivalent. – Nuoji Apr 27 '13 at 09:45
@Nuoji If you just ping first, it would not have the encrypted payload. Granted you would just be checking server up, not service up - but at least you know server is up and not imposing a complete backlog while server is down. – Tek Tengu Apr 29 '13 at 00:20
Well, that's something the server could handle without a ping packet. Basically, it could spin off the decryption on a queue and stay responsive. If the queue would grow, that request could simply cancel and send back a "server login overloaded" message. – Nuoji Apr 29 '13 at 07:41

Prevent DOS against RSA authentication

3 Answers3

Linked