32

I am trying to understand how massive sites like Facebook or Wikipedia work, for my intellectual curiosity. I read about various techniques for building scalable sites, but I am still puzzled about one particular detail.

The part that confuses me is that ultimately, the DNS will map the entire domain to a single IP address, or a handful of IP addresses in the case of round-robin DNS.

For example, wikipedia.org has only one type-A DNS record. So, people from all over the world visiting Wikipedia have to send a request to the one IP address specified in DNS.

What is the piece of hardware that listens on the IP address for a massive site, and how can it possibly handle all the load coming from the requests for users all over the world?

Edit 1: Thanks for all the responses! Anycast seems like a feasible answer... Does anyone know of a way to check whether a particular IP address is anycast-routed, so that I could verify that this really is the trick used in practice by large sites?

Edit 2: After more reading on the topic, it appears that anycast is not typically used for dynamic web content. Anycast is usually used for UDP (e.g., DNS lookups), or sometimes for static content.

One interesting thing to note is that Facebook uses profile.ak.fbcdn.net to host static content like style sheets and javascript libraries. Each time I ping this name, I get a response from a different IP address. However, I can't tell whether this is anycast in action, or a completely different technique.

Back to my original question: as far as I can tell, even a large site will have a single expensive piece of load-balancing hardware listening on its handful of public IP addresses.

Igor Ostrovsky
  • 432
  • 4
  • 8
  • Great questions, too bad most people don't understand it. I hope someone will have an answer with some details. Maybe some 50 million dollar Cisco quantum computing powered load balancer. – OliverS Feb 16 '10 at 12:56

9 Answers9

9

It isn't necessarily a piece of hardware doing this but a complete system that has been designed to scale. This not only encompasses the hardware but more importantly the application design, database design (relational or otherwise), networking, storage and how they all fit together.

A good starting point for your curiosity on finding out how some of the large sites scale is High Scalability - Start Here and High Scalability on Wikimedia architecture, Facebook and Twitter as examples.

Regarding your question about DNS and single IP addresses and round-robin these types of sites will often use load balancing as a method of presenting a single IP address. This can be done either by specialised hardware load balancers or through software running on general purpose servers. The incoming requests to the IP managed by the load balancer is then distributed across a series of servers transparently to the end user.

For a good explanation on this topic, including a comparison of hardware and software load balancers/proxies and how they compare to DNS round robin, have a read of Load Balancing Web Applications.

Sim
  • 1,858
  • 2
  • 17
  • 17
  • Thanks, Sim. I read through most of these articles before asking my question, but I didn't find a concrete answer. So, there really is a single hardware load balancer (or a single machine running load-balancing software) that is hit every time someone views a Wikipedia page? Or, is there another trick somewhere to avoid the bottleneck? – Igor Ostrovsky Feb 16 '10 at 02:02
  • I'm not sure what Wikipedia is doing now but this article from 2008 talks about them using a series of Squid reverse proxy servers http://blogs.sun.com/WebScale/entry/scaling_wikipedia_with_lamp_7 – Sim Feb 16 '10 at 02:10
  • I doubt that it would be a single hardware device or server controlling this - redundancy is the key. There also things that can be done with distributed DNS and content delivery networks as well. So where you end up going for a site in Europe might be somewhere else entirely in Asia. – Sim Feb 16 '10 at 02:13
  • As far as I understand, a content delivery network only helps with subsequent requests, not the first one. The first request has to go to the specific IP address named in the DNS entry. And, "distributed DNS" addresses a different problem. – Igor Ostrovsky Feb 16 '10 at 02:21
  • From the Wikimedia architecture link in my answer it looks like Wikipedia use, or at least did use, "Geographic Load Balancing, based on source IP of client resolver, directs clients to the nearest server cluster. Statically mapping IP addresses to countries to clusters". – Sim Feb 16 '10 at 02:22
  • Hmmm... so the trick is that Wikipedia's DNS servers return a different IP address depending on the location of the client? When I ping wikipedia.org from Seattle, I am pinging 208.80.152.2. Do you get a different mapping when you ping wikipedia.org from your location (Australia, I assume)? – Igor Ostrovsky Feb 16 '10 at 02:34
  • That would be one of the things that happens. After that there will be hardware and software load balancers. As far as the initial variable DNS responses go - Google do this and recently proposed a change in DNS client standards so that they geographic targetting could be improved in future. That may take some time to catch on obviously, if it is ever agreed as a standard. – Helvick Feb 16 '10 at 03:16
  • 2
    There are also anycast addresses where you ping one ip-address but they are distributed (randomly\arbitrarily\intentionally) to one of a range of "real" endpoints. I'm not sure if Wikipedia\Google uses this but I'm pretty sure some of the root DNS Servers do. My pings to Wikipedia match yours (and I'm in Ireland) so I suspect they might be using that. – Helvick Feb 16 '10 at 03:23
  • 1
    Anycast is used in the DNS query to get the IP address nearest to you - then a load balancer listens on that IP address and distributes the requests to the backing servers. – Andy Shellam Feb 16 '10 at 08:29
  • 2
    Wikipedia also happens to use pdns's geoip backend for much of their load balancing. more info here: http://wikitech.wikimedia.org/view/PowerDNS and here: http://wikitech.wikimedia.org/view/DNS – faultyserver Feb 16 '10 at 17:15
  • Thanks, faultyserver! The article on wikipedia's use of DNS is great, and helps me understand the total picture better. – Igor Ostrovsky Feb 16 '10 at 21:42
3

Anycast can also be used for TCP connections, assuming the connections are short-lived so the routes do not change during the connection lifetime. This is a good assumption with HTTP connections (especially if Connection: Keep-Alive is kept to a short timeout or disabled).

Many CDNs (CacheFly, MaxCDN, and probably many others) actually use anycast for TCP connections (HTTP), and not just DNS. When you resolve a hostname on CacheFly, you get the same IP address around the world, it is simply routed to the "closest" CacheFly cluster. "Closest" here would be in terms of BGP path length and metrics, which is usually a better way to measure network latency than simple geographic distance.

In the case of Wikipedia specifically: http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/

rmalayter
  • 3,744
  • 19
  • 27
3

The easiest way to verify if an IP address is using Anycast is to do a traceroute from different location. You can try the following : go to traceroute.org , pick a location and try to do a traceroute to IP address 8.8.8.8 ( Google Public DNS that use anycast ). You should be able to see that traceroute from server in Australia to 8.8.8.8 stay in Australia.

Instead of ping, try to do hostname lookup : eg : http://network-tools.com/default.asp?prog=dnsrec&host=profile.ak.fbcdn.net

You will see the list of IP address behind that name. These IP addresses will be use in a round-robin fashion when you ping the server.

Rianto Wahyudi
  • 493
  • 3
  • 11
2

Igor, your question is great, and like so many innocent questions, there are many, many answers, all at different levels of details.

The piece of hardware is a web server. Obviously ;-)

The piece of hardware is actually a cluster of load balancers, all of which are configured to pull from shared storage so they're all identically configured with identical material.

The piece of hardware is actually one of several clusters of load balancers, geographically dispersed, and you were directed to the one closest to you, a decision made by the DNS server.

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
1

Google released a bit on their homegrown hardware architecture last year and it makes for a good read.

squillman
  • 37,618
  • 10
  • 90
  • 145
  • This is an interesting read, but it does not answer my particular question. I am specifically curious what are the piece of hardware that listen on the four Google's public IP addresses, and distribute the load among the thousands of servers? – Igor Ostrovsky Feb 16 '10 at 01:53
1

A single IP address does not necessarily mean a single server: http://en.wikipedia.org/wiki/Anycast

Justin
  • 3,776
  • 15
  • 20
  • 1
    Anycast is a difficult setup to maintain, if you do have some central synchronisation (like Facebook). It does work really well for e.g. DNS servers, where instances don't need much communication, or web servers with static content. –  Feb 16 '10 at 06:54
  • 1
    You're right in that a single IP doesn't mean a single server, but anycast is used in the DNS query when you're not bothered who replies as long as you get one, and hence it's only useful with the UDP protocol which DNS uses. With TCP (used in HTTP) you need to be sure that the server that responds is the one you specifically asked. – Andy Shellam Feb 16 '10 at 08:31
  • @AndyShellam, The articles http://en.wikipedia.org/wiki/Anycast#Details http://www.nanog.org/meetings/nanog37/presentations/matt.levine.pdf seems to disagree with you... – Pacerier May 13 '14 at 09:02
1

Larger sites use several different techniques together. Those websites you mentioned do all have in almost every country several server. Based on the IP address of the website visitor the DNS server is giving back an IP address of the cluster which is the nearest to the visitor. Akamai is providing such a service (click on the picture on this website for more information.)

Those "clusters" in this datacenter consist now of several different machines (DB server, web server, load balancer, etc.) Depending on what you are providing with your website you have maybe some servers for the static content etc.

Raffael Luthiger
  • 2,011
  • 2
  • 17
  • 26
1

Mmassive sites like Facebook or Wikipedia rely on several different technologies to achieve scalability.

One of those technologies is dns. Dns is configured to load balance with round robin. The dns configuration is smart enough to figure out where your request is coming from and to return the address of the site that is closest to you. So if you do a dig you will see multiple records, but if you do a ping you will always get back the same address.

At the site, the first piece of hardware you hit is a reverse proxy or a load balancer pool. The pools are setup so all machines answer the same IP but return a new IP in the session header. All further requests will go through the same node.

The load balancers employed for large sites are not large expensive pieces of equipment, they are commodity servers running LVS. http://www.linuxvirtualserver.org/

user67823
  • 104
  • 2
0

Massive sites like Google almost certainly design their own hardware. Large sites would probably use a multi-layer switch to load balance connections to multiple actual servers. http://en.wikipedia.org/wiki/Multilayer_switch

Chris S
  • 77,337
  • 11
  • 120
  • 212