How do CDN (Content Distribution Networks) work?

37

19

Taking Akamai as CDN. From what I understand, when a client requests a page, the request goes to Akamai central server, which then depending on the location of client, picks up an Akamai edge server and subsequent requests from client goes directly to this edge server. My question is that:

When a client would request a website (by name), once the DNS resolves the name to IP address of Akamai central server and passes it on to the client, the client will hold on to this IP address, then how are the subsequent requests able to go directly to IP address of Akamai edge servers?

Or is is that it is necessary that when CDN is being used then the DNS resolution itself should be done by the CDN server?

Do biggies like Google, Amazon, Facebook have their own CDN servers or they rely on 3rd party CDN providers like Akamai? Say Google and Yahoo! both use Akamai CDNs, then does content of Yahoo! and Google reside on same server? Doesn't it pose a potential security issue?

p2pnode

Posted 2012-05-05T15:25:12.147

Reputation: 1 257

Answers

27

You don't simply host the whole site with the CDN, just your content.

I just realized I answered a similar question a while back: What does akamaihd.net do?

Data request flowImage by WikiMedia

So your site references http://akamai/myfile.ext. This will request myfile.ext from akamai. akamai can then send an HTTP redirect to the actual content server.

Now, when that last step is cached, great, all future requests will go to the closest content server.

How does that work?

Let's assume this website:

<html>
  <body>
    <img src="http://cdn/oliver.png" />
  </body>
</html>

I request this website from my own webserver. The .html file is not hosted with cdn. Neither is the DNS of my webserver.

Initial request

So my browser got that HTML file and now parses it. It finds the referenced image and notes that it is located at http://cdn/oliver.png. It requests that file.

To do that, it need to find the IP address of cdn. In our example, that IP address is 10.10.10.10.

With that IP address, it can connect to the cdn server and request /oliver.png.

Geo Location

Now cdn realizes, "that guy is from Germany!". So instead of sending me my awesome picture that I wanted, it sends me an HTTP redirect saying:

/oliver.png is not here. It's at 10.10.33.33/oliver.png

So my browser will ask 10.10.33.33 (which is hopefully closer to me) for the picture.

Seriously?

I'm not saying this is how ALL CDNs work, but it would be one approach.

You could also implement a DNS daemon that returns different results for a name lookup depending on the location of whoever sent the query.
But I doubt that this is done in practice. But maybe I just can't imagine how to properly set that up. See fluffy's answer for how that could work.

Who runs CDNs?

Most global players have their own content delivery network in a way (or so I would assume). Some providers just offload certain services to larger CDNs (like Microsoft does with MSDN downloads). And this might somehow touch on your second subject.

Consider this, in the MSDN Microsoft offers product downloads. These downloads are then provided by Akamai. If you can determine the URL of that download, you can just download the product without ever getting in touch with Microsoft.

Is that a security issue? Not really, because what is being downloaded is still protected (by a product key).

But how about other data?

If your data is security relevant, then it isn't CDN material. If you don't want something to be available as widely as possible, don't put it in a CDN.

Der Hochstapler

Posted 2012-05-05T15:25:12.147

Reputation: 77 228

Say the site is http://somewebsite/file.txt . Say Akamai DNS is being used. Then does the 1st very request from client go to somewebsite or the first very request itself goes to Akamai (because somehow the DNS being used by client knows that Akamai CDN is in use)?

– p2pnode – 2012-05-05T15:46:26.457

And perhaps I don't understand HTTP redirect and what they are able to achieve, so my question still remains that how does the client know to use IP address of Akamai edge servers so that Akamai central servers don't have to come into picture at all.. – p2pnode – 2012-05-05T15:48:43.460

@p2pnode: I expanded the answer a bit. Hopefully it includes what you're wondering about. – Der Hochstapler – 2012-05-05T16:02:20.523

1Isnt there a significant performance hit from this? Instead of all the data transfer happening over a single TCP stream, multiple streams are being used, more overhead from the handshaking,etc.. – Akash – 2012-05-05T18:35:21.240

@Akash: Normally, you wouldn't use a CDN for everything, but only individual, large files. So, in practice, this isn't an issue. – Der Hochstapler – 2012-05-05T18:49:26.943

11

A pretty common approach to CDN is to use what's known as "anycast." How this works is that your distributed servers are colocated with DNSes that respond with that server as the destination; for example, you might have three servers in different hosting facilities, and their respective DNSes all claim their IP address to be the canonical one for your server (call it, say, content.example.com). The DNSes are each configured to have the same global IP address, and then each of the servers' facilities use BGP updates to make it so that the route to the closest server wins - so when you do a name lookup on content.example.com, the fastest/closest/most available DNS responds to the request with its HTTP server.

In this way, no GeoIP tricks are necessary, and you're always being served content by whichever server is fastest for you - which may or may not have anything to do with its physical location, due to the heterogenous nature of the Internet.

It is my understanding that Akamai at least partially works in this way.

fluffy

Posted 2012-05-05T15:25:12.147

Reputation: 595

5

Also available are Origin Pull type CDNs.

Amazon Cloudfront is able to use this technique.

You set up a CNAME like media.example.com that points to their assigned server name and leave all your content on your server. For images and content you want delivered over the CDN, you use media.example.com in the URL. The request goes to their server network and if the content is not available, their servers pull the content from your server. Once in the system, the content is distributed to server farms closest to where the demand exists and remains there for the assigned TTL. Your server no longer sees any traffic on the cached content until the TTL expires and Cloudfront has to refresh it.

Fiasco Labs

Posted 2012-05-05T15:25:12.147

Reputation: 6 368

1

Akamai does not work this way. Different CDNs work differently, but Akamai specifically does not do anycast for their web servers.

When a user in NY wants www.acme.com, acme.com's name server redirects ("delegates") to an Akamai name server. The Akamai name server sees where the machine that is asking the question located (based on its IP address) and returns the IP address of the nearest/best Akamai server to serve www.acme.com.

igorlord

Posted 2012-05-05T15:25:12.147

Reputation: 111

How it works? The Akamai name server receives request from user directly or from user DNS server that delegates to Akamai name server? So it will be geolocated according to the DNS server location, not user's one? – odiszapc – 2014-12-26T08:43:23.697

0

A great summary of how Akamai's CDN works can be found here

In short:

  • CDN servers have a CNAME record that points to Akamai's DNS servers.
  • So the first request a clients browser makes to a CDN server has it's DNS looked up at Akamai's DNS server, which responds with the ip address of an Akamai server that's close to the user (called "Edge Servers")
  • These Edge servers may serve static elements from a local cache, if it has been requested by another user recently, and dont even have to go back to your server to get a copy of the asset.
  • Missing elements or non-cachable pages are routed through the Akamai network to another edge server near the host. That edge server makes the actual requests to the host site and passes them back through the network to the original edge server, and from there they are returned to the end-user.
  • Since the edge servers are internally communicating using Akamai’s proprietary protocols and routing around bottlenecks, traffic can flow much faster than over the public internet.

and as mentioned in the blog post listed above, some big corporations resolve DNS using their own servers, which can negate some of the benefits of using a CDN.

Brad Parks

Posted 2012-05-05T15:25:12.147

Reputation: 1 775

-2

CDN works on Anycast DNS. Anycast dns works on Anycast ip. Anycast ip: One ip assign on multiple server. When user request for dns resolver , that query will handled by nearest server and provide data from server with least latency .

abhimanyu rail

Posted 2012-05-05T15:25:12.147

Reputation: 1

In what way does this improve upon the existing, much fuller, answers? – Chenmunka – 2015-11-13T12:34:46.753