2

I created a sub-domain for static content to be able to serve it more efficiently by several load-balanced web servers. This static content is updated automatically, at a rate of ~1k files/day.

Right now I use rsync to update the servers in a master/slave way but since the content has a growing number of 100k+ files, it takes more and more time and puts an increasing I/O load on both the master and the slaves.

I cannot use the solution I proposed on the Improve rsync performance question since I cannot know which files are modified without stat-ing them all, and that wouldn't solve the increasing I/O cost. I also have to handle file deletion.

I thought about using something like a r/o NFS on the slaves but that could somewhat defeat the load-balancing effect and put a gratuitous SPOF.

Btw, the servers are running AIX, but I'm also interested in a solution in a more generic context.

Steve Schnepp
  • 2,222
  • 3
  • 21
  • 27
  • 1
    It's worth asking: are you sure you need two machines to serve this data at all? If you are using Apache with the prefork MPM you will find there's plenty of room left to grow on a single machine if you use something better (worker MPM, or an entirely different httpd like nginx). If you can do this, then you can setup a standby server and run a low priority rsync continually to ship the data across. – Alex J Jun 01 '09 at 07:49
  • @Alex: +1 since you do raise an often over-looked point. But yes, unfortunately, I do need multiple servers and I'm stuck with a prefork apache :-) – Steve Schnepp Jun 01 '09 at 08:02
  • You don't say how the files are put on the "master" or how they are organized. One big directory or split into sub directories? Seeing as some process must be creating or adding all these files, can it also produce (or use it's log to produce) a file list or recently added files? How are the file put on the master? Is there a log for that (samba, ftp xfer, or even inotify)? – David Aug 04 '09 at 01:30
  • It's organized in a somewhat mixed way: split into subdirs, but there are some huge dirs also. The files are produced by various ways, but mostly application & sftp. – Steve Schnepp Aug 04 '09 at 05:40

3 Answers3

1

You should maybe consider DRBD with OCFS so that you can have master/master nodes. This creates no SPOF because each node has a local copy.

You can also make two NFS node servers (DRBD master/standby or DRBD master/master with LB). If you have many nodes, this is the best option.

Antoine Benkemoun
  • 7,314
  • 3
  • 41
  • 60
1

Why don't you just use reverse proxy, such as Squid?

vartec
  • 6,137
  • 2
  • 32
  • 49
0

If you are using rsync, make sure that the files do not get accessed each time you sync them. This way only the file lists and the timestamps get compared and that should be reasonably fast, even with millions of files.

rsync -t should replicate the time stamps and compare timestamps only. If that doesn't work, use the option size-only (if your file may change without changing the size, be careful)

Alex Lehmann
  • 217
  • 1
  • 7