0

Currently we have a configuration that at the highest level looks like this:

[Traffic] -> Varnish (caching) -> HaProxy (load balancing) -> Apache (content and services)

There are (obviously?) multiple Apache servers and in general they provide two types of services ... one set of servers provides the more traditional types of web content (navigable pages for the most part) and the other set are service end points (and they in turn connect to a database and other backend functionality).

Service requests are filtered out early on in Varnish (specific domains, etc are identified in VCL and are passed directly to HAProxy -- there is no need to cache any of these calls).

"Content" requests do get cached by Varnish.

Need to add SSL support. Initially as a result of the need to add secure service requests (and responses) although I would expect eventually I will need to also have HTTPS calls into the content server(s) as well.

At present I have playing around with stunnel and while it works, the model I'm using effectively just uses stunnel to decrypt incoming requests and then passes them through HAProxy as normal *:80 traffic (so not using mod_ssl, etc., in Apache). So effectively things now look like:

[Traffic] -> Varnish (caching) -> HaProxy (load balancing) -> Apache (content and services) -----------> STunnel -----------------------------^

So it works, but my guts telling me this isn't really a long term solution. One possibility is just separating the traffic entirely):

[Traffic] -> Varnish (caching) -> HaProxy (load balancing) -> Apache (content and services)
[Traffic] -> Pound (or something else?) ------------------------> Apache (SSL content & services)

The Apache servers would likely be shared (SSL traffic would just be handled differently) but the systems which route traffic to the content/service servers would be different ...

Rummaging around turns up a number of opinions / options (including nginx, etc.,) but the first order question is whether the architecture as a whole makes sense (diverting incoming traffic to separate subsystems) or whether there is a more unified model that I should be looking at (and likely simpler). If the architecture makes sense then the follow-up is what to use for the SSL support aspect of this beastie ..

user90581
  • 3
  • 1
  • What are your concerns with the current setup? Splitting the traffic into a completely separate path seems like the more painful solution to me - and keep in mind you can have stunnel fit in where ever you'd like; decrypting the data and sending it to varnish would work, too, if that fits your needs from a security perspective. – Shane Madden Aug 05 '11 at 03:05
  • Fundamentally traffic ... right now it's relatively light (thousands to maybe 10K of requests an hour) but I do expect it to rapidly increase by end of year and into next. I may be just being a bit paranoid but I'd like to be able to provision more backend servers as needed for load (and independently depending on what sort of traffic may be increasing ... ) I expect, for instance, that non-SSL services traffic will increase first ... followed by SSL services traffic (which will have a different payload and type of data returned) followed by "standard" web traffic (both secure and non) – user90581 Aug 05 '11 at 16:28
  • Sorry ... not per hour, per minute ... – user90581 Aug 06 '11 at 04:34

3 Answers3

1

Wow, that stack is getting deep and complicated. Complexity is the enemy of uptime, and also in general the enemy of performance. Every one of those pieces has to manage connections, parse HTTP headers, etc.

I suggest you simplify things. Use nginx for SSL, load balancing, caching, as it supports all three with built-in modules. You can incrementally deploy it into your infrastructure as well, doing SSL only in front at first, then replacing HAproxy for load balancing the services, etc. You could event potentially ditch Apache and have nginx do almost everything if your services are written in a language that has a decent web or FastCGI server.

My small SaaS shop uses a flat nginx SSL/proxy/cache/static tier in front of Tomcat, IIS, and PHP-fastCGI back end services and static web servers. We see 2000 request per second peaks, and nginx isn't even breaking a sweat under that load with just two single-core VMware virtual machines at the front end of everything.

rmalayter
  • 3,744
  • 19
  • 27
  • It is ... and I may be pre-optimizing (for such things as scale) but I also would rather not have to think about rebuilding an airplane mid-flight (if you get the analogy) so I do like the idea of replaceable building blocks ( your suggestion about using nginx works in that regard as it would allow for incremental deployment / replacement as needed). I do anticipate (hope for?) rapid explosion of traffic / use (mostly on the services side) and thus want to keep flexibility in terms of scaling/modularity even if it's a bit more complex – user90581 Aug 05 '11 at 16:23
1

whether the architecture as a whole makes sense (diverting incoming traffic

The diagrams you've provided don't show any branches. I'm a bit confused as to why you've got varnish in front of HA-Proxy instead of the other way around.

but my guts telling me this isn't really a long term solution

The SSL encapsulation should be in front of the HTTP caching (otherwise the content can't be cached).

Certainly it would be nicer to reduce the number of hops, but merging the SSL onto one of the existing layers wouldn't give that much of a performance benefit (at least assuming that at least one end of stunnel is connected via localhost). Its the architecture which Oracle, Cisco, f5 etcetera tend to recommend (that is, with the SSL at the front end, although with the exception that they think you should be running their kit in there somewhere!).

If it were me I'd split the cacheable/non-cacheable content onto different customer facing URLs. (even better use a CDN for all the cacheable content!)

Important questions you've not answered include how many IP addresses you have, how many webservers you have, and the split between cacheable/non-cachable and http/https.

            +--->(cacheable:80)->--------------+
            |                                  |
            +--->(cacheable:443)---> stunnel->-+->Varnish ->-+
 HAProxy ->-+                                                |
            +-->(non-cacheable:443)--> stunnel->-----+-------+---->Apache
            |                                         |
            +--->(non-cacheable:80)->-----------------+

Obviously, if you drop varnish, (optionally using mod_proxy within Apache) this becomes a lot simpler...

            +--->(:80)->------------+
 HAProxy ->-+                       |
            +--->(:443)---> stunnel-+---->Apache

Given the price of hardware, I'm far from convinced that using a caching reverse proxy is a good trade off between price/performance - unless you've got huge amounts of traffic and a large proportion is cacheable. OTOH if you are implementing logic (such as ESI) then its not very practical to not have the proxy, in which case the question becomes can Varnish provide the required load balancing rather than using HAPProxy?

  (:443)-->stunnel--+
                    |
  (:80)-------------+-->varnish-->Apache
symcbean
  • 19,931
  • 1
  • 29
  • 49
  • mod_proxy may be a better route (simply because at this point I'm not seeing a heavy use of cacheable content ... I see a lot more on the services side, many many requests with small data pieces of data). I am curious about stunnel behind HAProxy (my understanding was that if done this way all HAProxy would be able to do is essentially just pass through those requests without any processing, correct?) To answer the complexity / patching question ... couple reasons why I did it that way: a) future expansion/scaling, b) the ability to farm out cacheable content to a CDN (or elsewhere) – user90581 Aug 05 '11 at 16:19
  • HAProxy will still be able to do load balancing – symcbean Aug 09 '11 at 11:42
0

You may try "mode tcp" (default) and "option ssl-hello-chk". Have a look at the HAProxy Configuration Manual. This way you do not benefit of the possibility to inspect http-headers by HAProxy, but maybe you just don't need that.

This way you give HAProxy all resources to load balance and ssl-decryption, caching, and providing of dynamic or static content is done in the backend, where you can scale easily.

HAProxy can handle HTTP/1.1 is able to keep many long lasting connections, so techniques like long polling or websockets may be used in your application. This is not possible with nginx at the moment for example.

webwurst
  • 362
  • 2
  • 6