0

So we're looking at migrating our single dedicated server (set up like shared web hosting) to an architecture with multiple load-balanced front-end servers plus separate database servers (due to traffic growth and availability concerns).

TLDR; I want some way of failing over to a local read-only mirror of the site files when the file server goes down.

The scenario:

  • About 200 vhosts with a few aliases each
  • Each site is basically only a "wrapper" with ~30 local files, mostly just template files, the rest is just included from a centralised code base
  • Sites are basically read only, except for maybe a cache directory (which can be separate on each host)
  • No 3rd party access to site files (ie. not shared)
  • Each site only gets a 2-10k hits/month

Goals / requirements:

  • Be resilient to any single server being taken offline for maintenance or due to an error
  • Some centralised way to make low volumes of file updates regularly (manually, just normal site updates), preferably via FTP
  • It would be acceptable for changes to be propagated to front-end servers in up to 5 seconds
  • If a server went offline unexpectedly I'd be happy for up to 10 minutes of file changes to be lost
  • At this stage we'd probably only be looking at 2 front end servers running full time, plus a file server
  • Will probably be implemented on AWS
  • More front end servers may be added and removed periodically

I realise a typical approach would be to deploy via version control, which is what we already do in some instances but our staff (non-developers, who mainly manage banners, text updates, etc.) are pretty used to an "FTP" workflow, which I'd like to reform but perhaps not yet.

Here are the solutions I've come up with:

rsync deployment

File server hosts the "master" copy of the site files which can be accessed via FTP and exposes these via an rsync server. I have some kind of "deployment" interface which triggers each front end server to rsync a site and "deploy it".

Pros

  • Pretty simple
  • Allows for a "staging" version on the file server which might be useful
  • Each front end server has their own copy of the files, no problem if the file server is offline

Cons

  • How reliable?
  • Potential for confusion over what's been deployed and what hasn't

NFS

NFS file server with local cache, periodically rsync a local backup and then possibly fail over by switching the mount point of a bind to the local backup.

Pros

  • Possibly supports writing (not that necessary)
  • NFS is in some ways simpler
  • Unless there's an outage they should always all be in sync

Cons

  • I'm not sure how well local NFS caching works and whether it'll invalidate caches of modified objects automatically. Without local cache NFS is pretty slow?
  • I'm pretty sure I'd need some kind of heart-beat to trigger the fail over and also to mount the master when it comes back online

Gluster, etc.

I'm not super familiar with this option, but I believe this is a common approach when you have a large number of servers. I looked at some of the documentation and it might be suitable, but I don't see many people recommending it on such a small scale.

Pros

  • Would allow read and writing
  • Supports caching I think, should be faster than non-cached NFS?
  • Automatic replication, recovery and fail over

Cons

  • I like the idea of having a single "master" volume which I can snapshot and backup, I don't think there's an option to say "this node must have a complete copy" in gluster?
  • With such a small pool of servers it seems like you could easily accidentally terminate two servers which happen to have the only copy of some data

So, my questions:

  • Is NFS really the only option for me?
  • Are there other options I should be considering?
  • Are any of these options the best fit for my needs?

Edit:

Thanks for your responses guys, I am starting to realise that I'm making things far too complicated considering the (relatively small) scale of my needs. I think the correct solution in my instance is that I need to introduce an explicit "deploy" event which triggers the local files to be updated, ether from version control or some other source.

Although files are updated regularly, when spread out across ~200 sites the reality is that it's unlikely most sites will be updated more than once a month, so having a mechanism to sync any arbitrary change instantly on any file at any time seems unnecessary.

thexacre
  • 1,849
  • 12
  • 14

2 Answers2

1

Complex thoughts... But you may be overcomplicating things.

Think about this. Under what circumstances would your NFS server "go down"? What specific failure mode are you trying to protect against? Isn't some of that mitigated by the cloud hosting provider's storage and virtualization platform?

I'm a fan of NFS for this purpose. Mind you, this is typically with bare-metal storage hardware that can have enough internal resiliency to sustain normal failures. But you're planning a cloud deployment.

So I'd envision clustered load balancers in front of your web server tier, backed by clustered NFS storage. The NFS storage would need to present a virtual IP (VIP) to the web hosts. A common approach is the use of DRBD and Pacemaker, but there are other options.

A standby NFS server that's periodically synchronized to could be an option.

You could use DNS names for your NFS mounts and leverage Route 53 (manually or automatically) to change the IP target if you lose primary storage.

A couple of ideas. I tend to avoid the full-on cluster filesystems for web solutions, but they can be options as well.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • "What specific failure mode are you trying to protect against?" - Hopefully only for maintenance. At the moment maintenance is a huge pain because we have no way to do a "dry run" and we can't afford to make any changes which _might_ impact availability, even if it actually wouldn't. I suppose being able to "swap out" NFS servers might be enough to achieve this. – thexacre Apr 21 '14 at 03:49
  • @mit Sure. That's fair. For a maintenance window, stop replication, change DNS (or disable NFS) and go from there. That's fair... – ewwhite Apr 21 '14 at 03:50
1

In one similar install that I handle, I've got gluster setup in a manner where each web server has a complete copy of the data. It means reads never have to go across the network and if perfect for us - fairly low volume of data with infrequent writes.

Volume Name: gv0
Type: Replicate
Volume ID: 1f70af9d-4caa-4d2d-8dbd-feedfacebeef
Status: Started
Number of Bricks: 1 x 6 = 6
Transport-type: tcp
Bricks:
Brick1: server1:/export/glusterfs
Brick2: server2:/export/glusterfs
Brick3: server3:/export/glusterfs
Brick4: server4:/export/glusterfs
Brick5: server5:/export/glusterfs
Brick6: server6:/export/glusterfs

Sounds just right!

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • Thanks, I suspected you could do something like that with gluster, although I was turned off this idea after reading [this](http://joejulian.name/blog/glusterfs-replication-dos-and-donts/) which suggests having a copy on every server with gluster is a bad idea, although with only ~3 servers which are read heavy it would probably be fine. Do you know how automatically adding additional nodes would work? I suppose they could just be a client rather than having a full replica if they were only to satisfy temporary load requirements. – thexacre Apr 21 '14 at 23:29
  • It's not a perfect idea but it works. You can also disable the auto-heal on fopen I believe (not that I'm necessarily recommending that). – MikeyB Apr 22 '14 at 02:25