Examining the A
records for youtube.com
(for example), I see only one IP address.
How can this be possible, given the volume of traffic they handle?
Do they use anycast with load balancers behind it?
Examining the A
records for youtube.com
(for example), I see only one IP address.
How can this be possible, given the volume of traffic they handle?
Do they use anycast with load balancers behind it?
There are several features which probably contribute to what you are seeing:
Anycast and geo balancing will help with spreading load out without the user seeing more than one or a few IPs. Load balancers will help reliability within one data center so those few IPs are hopefully not prone to single points of failure.
Despite the question and confirmation in comments I see more IPs for youtube:
$ dig youtube.com A
; <<>> DiG 9.8.1-P1 <<>> youtube.com A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 195
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;youtube.com. IN A
;; ANSWER SECTION:
youtube.com. 300 IN A 173.194.33.163
youtube.com. 300 IN A 173.194.33.164
youtube.com. 300 IN A 173.194.33.165
youtube.com. 300 IN A 173.194.33.166
youtube.com. 300 IN A 173.194.33.167
youtube.com. 300 IN A 173.194.33.168
youtube.com. 300 IN A 173.194.33.169
youtube.com. 300 IN A 173.194.33.174
youtube.com. 300 IN A 173.194.33.160
youtube.com. 300 IN A 173.194.33.161
youtube.com. 300 IN A 173.194.33.162
;; Query time: 14 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Wed Aug 26 23:45:18 2015
;; MSG SIZE rcvd: 205
There's a number of strategies in play with big sites like youtube:
Not everyone gets the same IP address. Different DNS requests will get different A
records coming back. Sometimes different IP addresses are returned based on geographic indicators (you get an IP that is "close to you"), and some variation is just for load-balancing purposes.
Very few requests actually go to the "main" site. If you examine the full set of requests that go into assembling a page requested from youtube.com
, you'll note that the vast majority of them go to other domains, which are handled separately.
Lots and lots of load balancers. Once a request gets to a single IP address, very efficient and highly scalable load balancers direct the requests to a very large number of frontend webservers.
Many machines servicing a single request. The frontend webservers do very little of the work involved in actually servicing a request. They are mostly for HTTP parsing and routing to more tiers of servers, each cluster of which does a very small and specialised part of the larger task of generating a page. I don't have a reference off-hand, but I remember reading an article some years ago that said that every single Google search request would cause code to be run on over a hundred individual servers in order to generate the response.
Hope that clears things up for you a little. If you have any more questions, it'll probably be best to create a new, tightly specified, question, rather than a lengthy discussion in comments.
Google/Youtube (as well as many other companies, in particular CDNs) co-locate servers with many ISPs, and then DNS will return the IP address for those servers. That explains why some people may only see one IP, and others see a dozen.
So the server that you see may not actually be in a Google data center, but rather just a few miles from your home/office, wherever the ISP's head end is. You can sometimes get a clue about where the server is with the traceroute utility (tracert in Windows) and/or reverse DNS.
Consequently, those servers also won't server all the traffic in the world, but rather just traffic from one city, and sometimes only from one ISP within that city.
Of course, those servers don't hold all of Google's knowledge; they are a front end, and probably have quite a bit of caching etc. as well. Anything they don't know, they'll forward to Google's data center, as womble described.