14

I found this explanation how a CDN works. But there is one thing I don't really understand. Let's assume I setup multiple DNS servers at my location and they use the nameserver domains dns1.example.com, dns2.example.com and dns3.example.com. This DNS servers are able to deliver a server IP depending on the visitors location (ping, geo database, browser language or whatever). Now I update this nameserver settings for my domain www.example.org at the registry.

Now, the very first request on www.example.org with an expired TTL tries to resolve the domain. It asks:

  1. the local .hosts/DNS, if TTL expired:
  2. the internet providers DNS, if TTL expired:
  3. the root DNS, if TTL expired:
  4. my local dns1.example.com

But if I understand it correct, the new IP is then added to all these nameserver caches until the TTL expires again. So how is it possible to send other IPs to the visitor depending on his location?

In this answer theandym said the request is "forwarded", but I don't think this is how a CDN works, because "forwarding" means lengthen the transmission way resulting a longer loading time. Or does a CDN require zero TTL for the domain?

Update1
Through this question I found Google's document describing how they optimized CDN performance. It did not explain how the CDN works in general, but there were interesting explanations like the following:

Thereafter, whenever a client attempts to fetch content hosted on the CDN, the client is redirected to the node determined to have the least latency to its prefix. This redirection however is based on the prefix corresponding to the IP address of the DNS nameserver that resolves the URL of the content on the client’s behalf, which is typically co-located with the client.

This means Google checks at first the latency of all IP prefixes and defines a DNS resolution table (?) for all available prefixes. And if a visitor has the IP 198.51.100.231 the Google server IP is used, that is set for the prefix 198.51.100.0. But again: How does Google's DNS know which IP the visitor is using? Most visitors resolve Google's domain through their internet provider and by that the resolving is done through those external DNS servers or not?

As an additional example: If I start a DNS resolution for the domain facebook.com with different online tools (hosted in different countries) it is resolved to different IPs with different domains like:

  • 31.13.92.36 Reverse: edge-star-mini-shv-01-frt3.facebook.com
  • 31.13.76.68 Reverse: edge-star-mini-shv-01-sea1.facebook.com
  • 31.13.69.228 Reverse: edge-star-mini-shv-01-iad3.facebook.com
  • 157.240.2.35 Reverse: edge-star-mini-shv-01-ort2.facebook.com

After that I thought it could depend on the DNS server location used by the visitor, but I tried my own (Deutsche Telekom, Germany), Google's (8.8.8.8) and a major one from France (Orange) and they all returned for facebook.com the IP 31.13.92.36.

mgutt
  • 459
  • 6
  • 22
  • 2
    Possible duplicate of [How does Google do DNS Geo Location request routing?](http://serverfault.com/questions/161537/how-does-google-do-dns-geo-location-request-routing) – Khaled Mar 01 '17 at 14:41
  • 1
    It is unlikely that a TTL will ever expire before a packet reaches a destination, unless there is a routing loop. Most people can send a packet around the world on a dozen or so hops, and the minimum default TTL is usually 64. – Ron Maupin Mar 01 '17 at 14:44
  • @RonMaupin I think this refer to the DNS record TTL, not the IP TTL. – JFL Mar 01 '17 at 16:49
  • @Khaled I read Google's document, but it does not explain how a CDN works. It explains only how they optimized the CDN to lower the latency as low as possible. I updated my question. – mgutt Mar 01 '17 at 22:37
  • @mgutt "How does a CDN work" is far too broad for SF. – EEAA Mar 01 '17 at 23:06
  • Some CDN's use more than GeoIP. Some also leverage Anycast. – Aaron Mar 01 '17 at 23:21

2 Answers2

10

Ok it seems I can now give a rough answer to my own question. Anurag Bhatia says that there exist two methods how a CDN works:

DNS

Have DNS to do the magic i.e when users from network ISP A lookup for cdn.website.com, they should get a unicast IP address of Cache A in return, similarly for users coming from ISP B network, Cache B’s unicast IP should return.

Lets say we have a server with the IP 1.2.3.4 located in USA and a cache-server with the IP 2.3.4.5 located in Germany. Now a visitor tries to resolve the domain example.org. If he did not change his network settings he uses the DNS server of his internet service provider (ISP). And this ISP asks now dns1.example.com (the nameserver of the domain) for the IP. Now it depends on the location of the ISP. If its located in Germany the dns1.example.com returns 2.3.4.5 and if its located in the USA it returns 1.2.3.4.

But there might be a disadvantage with this method: Every time a user changed his network settings and uses an EDNS0 (see IETF draft) incompatible DNS provider (for example a corporate's central DNS server) the dns1.example.com will answer again with the nearest IP to those DNS locations, but this time the visitor is most likely in a different location causing a higher latency.

EDNS0 compatible DNS providers are passing information about the user to the authoritative DNS server. So the authoritative DNS server can respond with the IP next to the location of the user:

Today, if you’re using OpenDNS or Google Public DNS and visiting a website or using a service provided by one of the participating networks or CDNs in the Global Internet Speedup then a truncated version of your IP address will be added into the DNS request. The Internet service or CDN will use this truncated IP address to make a more informed decision in how it responds so that you can be connected to the most optimal server.

...

; EDNS: version: 0, flags:; udp: 512
; CLIENT-SUBNET: 130.89.89.0/24/21

Anycast

Have routing to route to nearest cache node based on “anycast routing” concept. Here Cache A, Cache B and Cache C will use same identical IP address and routing will take care of reaching the closest one.

I don't really understand Anycast because of BGP, etc., but I think the further explanation of Anurag Bhatia gives an idea how it could work:

  1. Optimization is based on BGP routing and announcement with little role of DNS.
  2. This setup is very hard to build up and scale since for anycast to work perfectly at global level, one needs lot’s and lot’s of peering and consistent transit providers at each location. If any of peers leaks a route to upstream or other peers, there can be lot of unexpected traffic on a given cluster due to break of anycast.
  3. This setup has no dependency on DNS recursor and hence Google DNS or OpenDNS works just fine.
  4. This saves a significant amount of IP addresses since same pools are used at multiple locations.

Anycast has also a disadvantage: Routing is flexible. While at the start of a TCP session the target node might be located in network A it may change to network B. Therefore Anycast will be used in practice for UDP only. UDP is a session-less protocol.

Most CDN are using DNS for HTTP/HTTPS traffic and Anycast for DNS requests.

mgutt
  • 459
  • 6
  • 22
  • 1
    Anycast works because a router can have multiple paths to a destination network, and the anycasting entity sets up several, but it chooses the best one (from its perspective) to install in its routing table. The best route for a router running BGP is most likely the one with the least ASes to pass through to reach the destination. That means that different routers in different ASes could/would have different better paths. It also means that if one of the destination networks fails, any router with that as the best will withdraw it and select another best destination. – Ron Maupin Mar 01 '17 at 23:30
  • Thanks, but that wasn't the (most) confusing part for me. I do not really understand how it is possible that all Anycast routers of one CDN provider can have the same "global IP" and how a european user finally lands on the nearest european Anycast router.. – mgutt Mar 01 '17 at 23:53
  • 2
    That's what I was trying to explain. You can have multiple routers advertise the same network. The rest of the routers on the Internet will receive all the routes to, seemingly, the same destination, and they will install the best route from their own perspective, usually the fewest AS hops. The routers have no idea that these are actually different locations around the world, they only know that they have multiple routes to the same destination network, even if those destination are in geographically separate locations. – Ron Maupin Mar 01 '17 at 23:57
  • @JensBradler Nice find and thank you for adding infos about `EDNS0`! – mgutt Mar 22 '17 at 09:40
0

Your application will point to CDN (for assets, images, APIs etc). Then the CDN will either use the cache or fetch data/files from your servers. In your example, you will point to cdn.example.com and CDN will route it to dns1.example.com. cdn.example.com will fetch data from the nearest location on the anycast network so IPs can be different.

Source:

https://www.youtube.com/watch?v=JX2qrdp0WT4

https://www.akamai.com/blog/developers/how-cdn-can-make-your-apis-more-powerful

Manish Jain
  • 101
  • 1