9

I am new to CDNs and experimenting with CloudFront. I have set everything up and all appears to be working fine. I can create a static image on a page and access it though my CloudFront distribution. I am using a custom origin (i.e. not an s3 bucket).

I'm worried that I might be worse off from a performance point of view though. I have a test page that is loading up the same 20 or so images with and without the CDN. Looking at the net panel in Firebug, the first time I load this page the images that are loaded directly from the origin server come in much faster. On subsequent page loads the benefits of the CDN become obvious -- after 3-5 refreshes the CDN is doing better than the origin server.

So I can see that on a popular page on our site that is being hit all the time, this will be a benefit. And I should expect a benefit because I'm in Seattle (around the corner from Amazon) and my server is in CA.

The thing is that if I leave the page for a few minutes and then reload, things are back to square one, with CloudFront being worse than the origin server. Is this expected? Do things drop out of the CDN "cache" so quickly?

Is it possible that something in my setup is hurting performance? Or is the reality that the CDN will only be a net positive for content that is currently being accessed every few seconds on average?

(cross posted from the AWS forum because I've been spoiled forever by SO's turnaround times)

UPDATE:

There are two good answers below that are worth looking at if you have questions about CloudFront performance. I recently found one explanation for my specific problem wasn't mentioned though. I had left TTL at 5 minutes as an oversight. Since I'm also using a custom origin there is an additional round trip to the authoritative nameserver to resolve that to the actual Amazon CloudFront domain. Now that the TTL setting is back to 12 hours it seems that the long loads happen more seldom.

Greg
  • 247
  • 3
  • 10
  • Yes, it's possible that CloudFront is slower than just going directly to a fast server, because CloudFront is one of the slowest CDNs out there, due to the way Amazon has implemented it with multiple layers of DNS resolution, etc. Run some benchmarks from differnet locations around the world, and decide whether it's a good fit for you or not -- use http://www.webpagetest.org/ for testing. –  Apr 24 '11 at 16:28

4 Answers4

7

It is possible. However, one purpose of a CDN is scalability. You can expect the CDN to perform the same if you throw 100 visits at once or 1 million visits at once.

As far as your setup goes, there's nothing that I can know with the information you provided, but I think that the point above is what makes a CDN so valuable. If you're creating a site that doesn't get a lot of traffic, you might be better off without the CDN. However, the CDN will provide a lighter load on your web server if you get a lot of traffic because you're passing off the serving of your media to another server. One last point, a good CDN (and Amazon's is) will user their extensive network to serve your content from the location closest to the requestor. In many cases, they can serve the content from the requestor's ISP, meaning VERY fast load times.

Hope that helps.

Jesse Bunch
  • 314
  • 2
  • 9
  • Thanks Jesse - very helpful. The point regarding scaling is well taken. And we have sufficient traffic for it to make a big difference. I would still love to know the caching policy though. I have found an enormous amount of info about HOW to set up the CDN and very little about its characteristics. I'm wondering, for example, whether I should exclude (from the CDN) old content that is accessed very infrequently. – Greg Mar 27 '11 at 03:08
  • Greg - I don't see an argument for excluding the content, other than maybe for financial reasons. You can, however, control the cache headers of your object in Amazon. You might try looking into this: http://stackoverflow.com/questions/269840/is-it-possible-to-change-headers-on-an-s3-object-without-downloading-the-entire-o – Jesse Bunch Mar 27 '11 at 16:41
  • That would allow you specify far-future expires headers as you would with any normal web site's media. – Jesse Bunch Mar 27 '11 at 16:41
  • Thanks again. That cache-control link isn't relevant to my situation because I'm using a custom origin server, not s3. But the principal applies and I do have far future expires headers set. BTW, Amazon's docs do say that content lives in the cache for 24 hours, but my experiments indicate something different. – Greg Mar 27 '11 at 17:41
5

Cloudfront sets a header in replies like "X-Cache: Hit from cloudfront" in replies. Presumably, it will say "Miss" if your file wasn't in the cache of the node to which you were directed.

It is possible that your files just aren't popular enough, so they get ejected from CloudFront's cache by more popular content even though 24 hours haven't elapsed. Is also possible that IO overload or some other circumstance inside of a particular CloudFront node makes access slow. Cloudfront is very inexpensive compared with Akamai or LimeLight. Worst-case performance and guaranteed service levels are two of the reasons to use the more expensive players.

I would do a test, putting just one popular file into cloudfront in production, and then use periodic tests to see if CloudFront is indicating hits (also record total transaction time).

rmalayter
  • 3,744
  • 19
  • 27
  • I have updated the question with another potential explanation for the perf issue I saw, which is that I had left the TTL setting at the low setting of 5 minutes, but when switching back to 12 hours I don't think I'm seeing these occasional perf issues as often. – Greg Apr 19 '11 at 05:04
1

Have I misunderstood? Doesn't the cache-control manage how long things live at the edge locations before the edge locations reload them from S3? So surely they are relevant to your situation whether you use S3 or your own origin? No?

The Amazon FAQ says: "Q. How long will Amazon CloudFront keep my files at the edge locations? By default, if no cache control header is set, each edge location checks for an updated version of your file whenever it receives a request more than 24 hours after the previous time it checked the origin for changes to that file. This is called the “expiration period.” You can set this expiration period as short as 1 hour, or as long as you’d like, by setting the cache control headers on your files in your origin. Amazon CloudFront uses these cache control headers to determine how frequently it needs to check the origin for an updated version of that file. If your files don’t change very often, it is best practice to set a long expiration period and implement a versioning system to manage updates to your files."

[I assume the last sentence means "tough luck if you set it to 50 years and then want to change the file".]

Isn't the main point of using a CDN that it hosts static content? If so, would it help to use considerably longer TTL than one day? For virtually everything (all images and CSS), I use Cache-Control = "max-age=604800, public, must-revalidate" (i.e. 1 week). In my experience, files definitely do then take up to a week to change if I upload new versions onto S3.

Hope this helps. [BTW: On your more general point, I too wonder if a CDN helps performance as much as you think it's going to. I am about to move my entire site (CDN included) onto a super-fast dedicated server and do some tests to find out.]

Chris W
  • 11
  • 1
  • You are correct that the cache-control influences how long the content is kept at the edge. The TTL is a separate matter though. TTL controls the caching of the IP address assigned to the domain name. So regardless of whether the static file is cached at the edge or not, the first time a server sees the URL of the file it has to find the IP address of this domain. With 1-day TTL, it is likely that a nearby server has this info in its DNS cache. With a 5 minute TTL this is much less likely and a complete round trip to my origin server is required (not for the file, but to resolve the URL).. – Greg Apr 26 '11 at 16:29
  • Ah OK thanks. I was confusing DNS TTL and cache-control :) – Chris W Apr 26 '11 at 22:09
1

The reasons to use CDN is if you are expecting

  • Static content - infrequently or controlled updates
  • Viewed over the world
  • Accessed frequently

Our website is accessed infrequently as your case but we have a monitoring service setup that requests our website all over the world. So it keeps CDN caches warm. I would also like to share our case which is a simple one and demonstrates CDN capability.

Further more we are expecting a monthly charge of 2.2$ as opposed to 7$ for godaddy server(which cant handle surges)

Avg Page Load Times

Avg Page Load Time Distribution

JehandadK
  • 151
  • 5