4

I want to replicate my virtual machine and put it behind a load balancer.

Apache1  Apache2 ....ApacheN
   |        |           |
-------------------------
        LoadBalancer

I'd like to use only ONE configuration file for virtual hosts (actually a directory of conf files with Include in each httpd.conf), ONE log file and a common DocumentRoot directory for all instances. Is that possible, just sharing some directory between the virtual machines and configuring each Apache accordingly?

Or there will be some conflict with file opening and writing?

Is there maybe a better way of maintaining all the machines with the same configuration?

The only other thing I can think of, it's a script that copies a master configuration and restarts all the Apache instances. And also some script to merge all the logs...

Any suggestion welcome.

UPDATE: I thought that in my case performance was not an issue, but I stopped writing to the same log file from two instances, when I started to notice log corruption.

[04/Oct/2014:17:10:34 +0200] "GET /index.html HTTP/1.0" 200 15082 22633
[04/Oct/2014:17:10:36 +0200] "GET /index.html HTTP/1.0" 200 15082 13[04/Oct/2014:17:10:38 +0200] "GET /index.html HTTP/1.0"[04/Oct/[04/Oct/2014:17:10:40 +0200] "GET /index.[04/Oct/2014:17:09:42[04/Oct/2014:17:10:42 +0200][04/Oct/2014:17:09:44 +0200] "GET /index.html HTTP/1.0" 200 15082  
Giacomo1968
  • 3,522
  • 25
  • 38
Glasnhost
  • 541
  • 3
  • 10
  • 19

3 Answers3

11

Is there maybe a better way of maintaining all the machines with the same configuration?

Yes, a configuration management system (Puppet, Chef, ...) is the proper way of dealing with this.
They can automatically deploy updated configuration files and restart the services afterwards.
Logs should be sent via (r)syslog to a central logging server.

As for the content of the DocumentRoot: that depends on what is in there.
If it is static code, then packaging it and deploying it via the standard OS tools (yum, apt-get, ...) would be preferable.
It also makes it easier to roll out new versions slowly.

It is of course possible to have a file share but this may act as a single point of failure then.
And it will be painful to check which machine already restarted the service after a new config has been placed there as you add more servers.

faker
  • 17,326
  • 2
  • 60
  • 69
4

You may well share the apache config and document root among many identical machines - no problem to use e.g. an NFS share for these purposes.

It would be wise not to share the apache log directories because on heavy load you will get many concurrent writes. In one setup I used remote syslogging to get common apache logs. This will be a separate question how to achieve this. See http://httpd.apache.org/docs/2.2/mod/core.html as a starting point for the sender side.

You will have to configure your (server's local) syslog accordingly.

village
  • 64
  • 2
  • NFS mounts are trouble. If the server goes down, the NFS clients can block while waiting. I agree about not sharing log files. Not sure I agree about directories being a problem, but it would depend on your filesystem. eg probably true on GFS. Maybe also on NFS? I would avoid syslog as it's much more resource intensive than writing logs to file. Better to merge the log files. – mc0e Oct 04 '14 at 17:45
  • 1
    There should not be *one* NFS server. There should be redundant ones. With redundant disk arrays. – Jenny D Oct 04 '14 at 18:26
  • of course: using NFS always implies the suggestion to use an NFS cluster with sufficient redundancy (as some of the leading vendors offer) – village Jan 09 '20 at 09:32
2

I would recommend to either use a configuration management system, such as puppet or CFEngine, or to at least centrally store the configurations inside a single repository and pull them to all the web servers.

For the configuration management solution, you can either specify whole files that must exist, and where to get the cannonical copy on the configuration management server, or you can specify the parameters for those files in the configuration management language, which provides an abstraction layer and simplifies the process of introducing new configurations in a correct way.

For simply centrally maintaining and distributing files, you would probably want to check the config files into a version control software, such as CVS or SVN. From here, there are two pretty straightforward ways to get these configurations into all of your web servers.

  • You could then instruct your web servers to pull directly from the version management tool (cvs co or svn checkout)
  • Alternately you could do a little more work to make a more robust, scalable and reusable solution
    • script building an RPM of all of the apache configuration files (or the equivalent for your OS)
    • run a yum repo inside the version control server (or the equivalent for your OS)
    • then simply instruct your web servers to perform a yum update my-apache-configs (or the equivalent for your OS).

The VCS-only solution is easiest to setup, and will work across operating systems. The package repository solution is a little bit harder to setup, but it will pave the way for you to package and distribute configurations, codes and scripts of all sorts, and more closely aligns with the OS vendor's methodology.

The other nice thing about the package repository solution is that you can define dependancies and groups of packages. This means you could make my-apache-configs dependent on httpd and mod_ssl. You could then create an empty package that you call something like company_com-web_server that depends on my-apache-configs and my-ssl-certificates and any other packages specific to your company. To setup a new web server instance, put a freshly installed server (add your yum repo to the kickstart) behind the load balancer, issue a yum -y install company_com-web_server, walk away for a coffee, and come back up with a ready-to-roll web server instance.

===== EDIT =====

The value of this mechanism is that it creates a loosely coupled system. If the configuration management server or the yum repo goes offline, you lose the ability to reconfigure, but the web servers stay up. Even in htat instance, you could manually replicate changes to all machines, and check the changes in by-hand when the repo comes back up. Using shared storage (NFS, clustered filesystem, etc) would create a single point failure.

DTK
  • 1,688
  • 10
  • 15