How do big companies like Youtube load balance traffic with only one IP?

Question

Examining the A records for youtube.com (for example), I see only one IP address.

How can this be possible, given the volume of traffic they handle?

Do they use anycast with load balancers behind it?

Well, for one thing, I'm pretty sure the actual CDN isn't YouTube.com, it's some Google web content backend. — Michael Bailey, Aug 26 '15 at 23:36

score 18 · Accepted Answer · edited Apr 13 '17 at 12:14

There are several features which probably contribute to what you are seeing:

anycast can allow one IP to be served by servers in multiple locations.
geo balance gives out different IPs depending on what region you are in and what the load in the data centers is
load balancers usually include some sort of hot IP failover to improve reliability

Anycast and geo balancing will help with spreading load out without the user seeing more than one or a few IPs. Load balancers will help reliability within one data center so those few IPs are hopefully not prone to single points of failure.

reality does not fit the question

Despite the question and confirmation in comments I see more IPs for youtube:

$ dig youtube.com A

; <<>> DiG 9.8.1-P1 <<>> youtube.com A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 195
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;youtube.com.           IN  A

;; ANSWER SECTION:
youtube.com.        300 IN  A   173.194.33.163
youtube.com.        300 IN  A   173.194.33.164
youtube.com.        300 IN  A   173.194.33.165
youtube.com.        300 IN  A   173.194.33.166
youtube.com.        300 IN  A   173.194.33.167
youtube.com.        300 IN  A   173.194.33.168
youtube.com.        300 IN  A   173.194.33.169
youtube.com.        300 IN  A   173.194.33.174
youtube.com.        300 IN  A   173.194.33.160
youtube.com.        300 IN  A   173.194.33.161
youtube.com.        300 IN  A   173.194.33.162

;; Query time: 14 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Wed Aug 26 23:45:18 2015
;; MSG SIZE  rcvd: 205

In any decent modern router anycast and friends will fold into a "policy based routing" mechanism, which is a most flexible way to build complex networks, single ip load balancing included. — oakad, Aug 27 '15 at 06:12
`$ dig youtube.com A ;; ANSWER SECTION: youtube.com. 278 IN A 216.58.211.78` I get only one address ... But now I understand why, thanks :) — Victor Lft, Aug 27 '15 at 13:19

score 11 · Answer 2 · answered Aug 26 '15 at 23:41

There's a number of strategies in play with big sites like youtube:

Not everyone gets the same IP address. Different DNS requests will get different A records coming back. Sometimes different IP addresses are returned based on geographic indicators (you get an IP that is "close to you"), and some variation is just for load-balancing purposes.
Very few requests actually go to the "main" site. If you examine the full set of requests that go into assembling a page requested from youtube.com, you'll note that the vast majority of them go to other domains, which are handled separately.
Lots and lots of load balancers. Once a request gets to a single IP address, very efficient and highly scalable load balancers direct the requests to a very large number of frontend webservers.
Many machines servicing a single request. The frontend webservers do very little of the work involved in actually servicing a request. They are mostly for HTTP parsing and routing to more tiers of servers, each cluster of which does a very small and specialised part of the larger task of generating a page. I don't have a reference off-hand, but I remember reading an article some years ago that said that every single Google search request would cause code to be run on over a hundred individual servers in order to generate the response.

Hope that clears things up for you a little. If you have any more questions, it'll probably be best to create a new, tightly specified, question, rather than a lengthy discussion in comments.

The "work" inside Google is almost certainly spread across dozens or hundreds of servers using their [MapReduce technology](http://research.google.com/pubs/pub36249.html). — wallyk, Aug 27 '15 at 06:07
from one location i get 173.194.33.98 173.194.33.105 173.194.33.99 173.194.33.97 173.194.33.103 173.194.33.96 173.194.33.101 173.194.33.102 173.194.33.100 173.194.33.110 173.194.33.104 and from another i get 173.194.65.91 173.194.65.93 173.194.65.190 173.194.65.136 — Skaperen, Aug 27 '15 at 08:37
@wallyk MapReduce isn't a real-time query processing technology, it's a data processing technology. — womble, Aug 27 '15 at 20:29

score 3 · Answer 3 · answered Aug 27 '15 at 08:19

Google/Youtube (as well as many other companies, in particular CDNs) co-locate servers with many ISPs, and then DNS will return the IP address for those servers. That explains why some people may only see one IP, and others see a dozen.

So the server that you see may not actually be in a Google data center, but rather just a few miles from your home/office, wherever the ISP's head end is. You can sometimes get a clue about where the server is with the traceroute utility (tracert in Windows) and/or reverse DNS.

Consequently, those servers also won't server all the traffic in the world, but rather just traffic from one city, and sometimes only from one ISP within that city.

Of course, those servers don't hold all of Google's knowledge; they are a front end, and probably have quite a bit of caching etc. as well. Anything they don't know, they'll forward to Google's data center, as womble described.

How do big companies like Youtube load balance traffic with only one IP?

3 Answers3

reality does not fit the question