Geo-balanced (static) website hosting

0

Abstract

I'm rebuilding my website and this time around, I have decided to go with a static website generator (jekyll, sculpin.io, etc). Partially as an excercise, I want this site to be hosted on a globally balanced hosting for speed. (So US viewers should get the site from the US server.)

So, my goal is to create the fastest site humanly possible. Because I hopefully can.

As a bonus task, if possible, I'd like to place a redirector that directs certain language users to the English, German, etc version of the site.

The question is, how do I do this? Here's what I've tried:

Using a CDN

I could get a random static webhost and put a CDN in front of the whole page. While this would work reasonably well, CDN's drop little used resources from the cache and need to re-fetch them when needed, which increases the load time.

Amazon AWS static website hosting

Using Amazon S3 buckets, one can build a static hosting. The problem is that the bucket's name must be EXACTLY the same as the website's name and bucket names are global, so I can't create multiple instances of the bucket and serve the site directly.

Amazon Route53/EC2

While running my own operating system is less than optimal (too much work, high costs), it is an option. Especially since Puppet makes automation easy.

This setup would require an EC2 instance in each region, one Elastic Load Balancer configured in front of it and Route53 to route traffic to geographically local ELB's.

Build it yourself

Rending VPS or root servers in each region, I can run my own OS and install nginx. In any other respect, this would work much like the AWS setup.

Summary

None of these fulfill my needs for a static website hosting. How would approach this problem? What are the hidden issues on this? Any services that I need to look at?

Janos Pasztor

Posted 2015-11-03T12:05:02.413

Reputation: 767

Answers

0

By Cosmocatalano (Own work) CC0, via Wikimedia Commons; public domain image from https://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/Project-triangle.svg/256px-Project-triangle.svg.png

I am inclined to begin by saying “Pick any two.”

This variant of the iron triangle essentially says says you can't have all three. If we assume you want good (performance, reliability), fast (simple deployment), and cheap (service provider charges and maintenance) ... your best case may be a two-out-of three proposition for which there are few magic bullets.

I am inclined to disagree with the premise that the CDN approach would not be sufficiently fast, since an object doesn't have to be insanely popular in order to be sufficiently popular, and the occasional brief wait for a minority of the objects on a well-designed page can go unnoticed. A CDN like CloudFront has the additional advantage that your requests to the origin server traverse Amazon's own network to a significant extent, rather than the public Internet, removing some of the variables of global data transport.

But, there is a hybrid approach combining essentially all of the elements you've mentioned:

The front line is the CDN. We'll say CloudFront, which uses DNS to automatically route requests from browsers to the most optimal edge location.

The origin server the CDN connects to on the back-side is actually multiple servers, geo/latency routed using Route 53, so the CDN edge connects to the closest origin server, in a nearby AWS region.

These geo/latency-routed targets -- the origins where the CDN refreshes any non-cached objects -- are EC2 instances, but instead of full-blown web servers, they're running proxies, one or more in each region, with S3 as their storage back-ends instead of hard drives. Since the proxies can rewrite the original host headers on the way to S3, your bucket names no longer need to match, so you can put one in each region. You can achieve substantial throughput on very small instances with a proxy like HAProxy (I have t2.micro machines serving 2 million requests a day while maintaining a steady CPU utilization around 3%). Since the proxies are in the same region as the buckets, there are no data transfer charges. You don't need ELB, because the Route 53 health checks can remove a non-functioning proxy from the pool the CDN will select from. If the S3 bucket for any reason becomes inaccessible to the proxy, the proxy will deliberately fail its health check, causing it to be removed from selection.

If you wanted to go absolutely sub-millisecond nuts, you could use Varnish as the proxy at the EC2 machines and cache the content from S3 inside EC2 so you'd potentially already have it if the CDN needed a fresh copy.

So the browser selects the nearest CDN edge, the CDN selects the nearest back-end, which is a proxy (one of potentially several per region) which has a low-latency path to any content not already held by the CDN, stored in a bucket in the same region.

Highly-available, fault-tolerant, extremely responsive on a global level, built from standard components, interconnected in a slightly creative way. (That's how I do it.)

Michael - sqlbot

Posted 2015-11-03T12:05:02.413

Reputation: 1 103

Down-voter, do you have a comment? – Michael - sqlbot – 2015-11-04T01:06:41.970

This actually sounds quite reasonable with a few modifications: 1. instead of doing a proxy, the EC2 instances could just download the whole page (remember, it's static). 2. I am unsure if the language redirection would work with a CDN in front of the page, so that may have to be bypassed, or the CDN left out of the game completely.

I'm more on the side of building a system that's reasonably fast and good, while being reasonably priced (~50-100 USD a month). It's a tech demo, after all. – Janos Pasztor – 2015-11-04T15:45:19.753

1

I completely forgot about your content localization requirement. If enabled, CloudFront inserts a header, CloudFront-Viewer-Country, that you could use for localization. If the value of the header is, say, US, it will cache that response against that specific value, i.e., only for users who also resolve as US -- but will send a new request to the origin if any not previously seen value is there. See my test site at https://cloudfront.sqlbot.net. (View the page source to see the request headers sent by CloudFront). You could use that to change the content or to redirect to /us/.

– Michael - sqlbot – 2015-11-04T17:58:47.860