How come sites like Google/Facebook/etc. don't get DDOS'd even though they receive so many requests?

14

2

Something I don't understand:

(Tens/hundreds of?) thousands of people simultaneously try to connect to a site like facebook.com or google.com.

From what I understand, they must all necessarily connect to the same initial server (because DNS will return the same IP to many of them, and so all the requests go to the same target destination).

So a single machine/router must handle all the initial requests, even if it plans to forward them to other machines.

How come that single device doesn't get overloaded when this happens?

user541686

Posted 2012-02-03T20:47:11.327

Reputation: 21 330

8

Your assumption about DNS is incorrect: http://en.wikipedia.org/wiki/Round-robin_DNS

– Der Hochstapler – 2012-02-03T20:48:56.743

@OliverSalzburg: Thanks for the link, that's helpful. – user541686 – 2012-02-03T20:52:48.573

Answers

20

Your understanding that they all connect to the same server is wrong, although the details of how you achieve those results are complex. http://highscalability.com/ has reference work on how some of the scalability solutions are put into play.

They have well more than just "one" server that clients connect to, even if the public IP address looks the same. Google, for example, make heavy use of anycast addressing to direct people, and usually they don't just have one IP address for each client - even if they return just one address when you ask.

Daniel Pittman

Posted 2012-02-03T20:47:11.327

Reputation: 3 510

+1 thanks for pointing out the error. I can't help but wonder, though: if subsequent requests go to a different server every time, then how does a server continue a different server's session? Or is the randomness on a per-machine/per-session basis? (I would imagine that they all synchronize at the backend, but it would seem very slow to synchronize thousands of servers holding information about millions of users simultaneously.) – user541686 – 2012-02-03T20:55:58.337

1The answer is complicated, and depends on the implementation, but one approach is to have a pool of machines that do nothing but send the packets to the right destination without ever actually making a TCP connection. Look to F5, and other load balancer vendors, for the small end of how. Google, I think, use something they built themselves. – Daniel Pittman – 2012-02-03T21:00:16.367

You can also use a split-session methodology. There is a session between the user and the server they're directly connected to, and a "master logical session" between the user and the logical service. If the user moves to a different server, that server just resumes the same master logical session to the logical service. – David Schwartz – 2012-02-03T23:02:54.787