I'm setting up a large Drupal (Pressflow) site and this is my current plan. Have I gone and done anything blatantly stupid? Does anyone have any experience hosting a large, multi-server Drupal installation like this?
3 Answers
I'd be tempted to have a pair of varnish nodes behind HAProxy to deliver a HA Varnish cluster.
You could easily have 2+ varnish nodes alone, without the need for HAProxy, but then you can only load balance HTTP Traffic. At least with HAProxy, you've got a TCP load balancer too.
What do you propose the edge of your network looks like? Do you plan to have a HA Pair of hardware firewalls? Do you need edge-routing, BGP and multiple transits?
Another thing to consider is how your file server works. You could probably benefit from having a pair of file servers, using a storage server like GlusterFS, or MogileFS. That way you can ensure redundancy all the way through the infrastructure.
Adding multiple Memcached nodes is also trivial, gives you more redundancy and resilience against traffic spikes and hardware failure.
Make sure that you take steps to optimize your front-end delivery of content, especially if you anticipate high traffic. Keep all media on a media domain, ideally a cookieless one, like http://blog.stackoverflow.com/2009/08/a-few-speed-improvements/ do with sstatic.net
You might also want to consider the use of a CDN to cache static content, such as CSS and non-changing JS. This multiple-level cache infrastructure will even out the slashdot effect, and also give you more resilience to failure. This is because such a large proportion of browser requests are for static content, which can be effectively served from a CDN's PoP which is nearest to the requester. The other advantage of caching on multiple layers (Browser, CDN, Varnish, Memcache) is that after a while, everything is cached multiple times, in multiple places. This gives you the resilience against failures.
A large drupal site is really no different to a large anything site. Just ensure you have multiple levels of redundancy on every layer of the network.
As for the specification of the actual servers, you probably want >8G of ram on the varnish nodes.
I'd recommend Intel server NICs on the load balancer boxes, and either Cisco or HP Procurve switches for the core of your network.
Your database nodes should be fast multi-processor servers with 15k SAS disks for speed. For redundancy, put 4+ Disks in a RAID10 array.
I wouldn't recommend doing this in a shared hosting environment. Dedicated servers might be OK, but for piece of mind, I'd be specifying a 1/4 rack in a carrier neutral datacenter. This way, you get the most freedom for the actual configuration and management of the servers.
Added:
Do you absolutely need to run apache?
For the servers hosting the media files on the cookieless domain, you'd probably be better off hosting these from a lighter weight HTTP Server, Nginx is a fantastic solution for this. Apache is probably more suited to the hosting of Drupal itself, but there's no real reason you couldn't use Nginx and FastCGI for example.
- 27,440
- 10
- 72
- 148
-
A note: This'll all be running in an existing VMWare ESX farm. VMWare's fault tolerance will be turned on for critical VMs. We use nginx as a reverse proxy for Apache currently on smaller Drupal sites than this one, but I'd figured the Varnish caches should remove the need for that. Am I mistaken in this? – ceejayoz Mar 10 '10 at 16:26
-
Depends whether you'd rather use nginx caching or varnish caching. Varnish has the advantage that it's got a more extensible DSL (Domain Specific Language) for tuning its performance and handling of requests, than perhaps you'd achieve with nginx. Personally, i'm a bit wary of doing all of this on virtualization. I'd sooner have it all in a rack with physical servers. Especially for the database nodes. – Tom O'Connor Mar 10 '10 at 17:18
-
But i'm crazy like that. I think you'd get better database performance from having local SAS disks in a fast server, than any performance of a VM (which can become IO bound under certain conditions). Depends what works for you, i suppose. – Tom O'Connor Mar 10 '10 at 17:19
-
The main appeal for virtualisation here is that we have an existing infrastructure for it, which gives us a whole bunch of emergency capacity. If I need another half dozen Apache nodes, I can have them up in a few minutes instead of having to order servers. – ceejayoz Mar 12 '10 at 13:53
Something worth mentioning is that if you plan on using https you need something in front of your load balancer to handle https connections. I am not sure if varnish can handle that, but I'd recommend using either nginx or stunnel for that job.
- 497
- 2
- 5
-
Thanks, that's something I hadn't considered. We will indeed have a small amount of HTTPS traffic on this. – ceejayoz Mar 12 '10 at 13:51
-
I'd use HAProxy to load balance HTTPS traffic, present a single IP, representing a bunch of HTTPS servers, then do layer 3 load balancing with HAProxy, instead of HTTP LB with varnish. – Tom O'Connor Mar 12 '10 at 17:33
Can I just ask how you plan to implement a seperate file server? This is something I am really after but standard srupal does not seem to support this.
-
-
I heard that NFS is a really bad idea as it has reliability and performance issues...? – Jul 13 '10 at 14:17