1

I have an application which needs to scale horizontally to cover web and service nodes (at the moment they're all on one) but interact with the same set of databases and source files (both application code and custom assets). Database is no problem, it's handled already with replication in MongoDB.

Also, the configuration of the servers are the same (100% linux). This question is literally about sharing a filesystem between machines so that its content is always correct, regardless of the node accessing it.

My two thoughts have so far been NFS and SAN - SAN being prohibitively expensive and NFS seeing some performance issues on the second node with regards to glob()ing in PHP.

Does anyone have recommended strategies or other techniques that don't involved sharding data across nodes or any potential gotchas in NFS that may cause slow disk seek times?

To give you an idea of the scale, the main node initialises it's application modules in ~ 0.01 seconds. The secondary is taking ~2.2 seconds. They're VM's inside a local virtual network in ESXi and ping time between them is ~0.3ms

  • 1
    Whilst doing some more digging around, I found that part of the problem is that PHP is calling `lstat` alot. One massive performance boost in this set was setting the `realpath_cache_size ` in `php.ini` from the default, 16k to 1M. – Andrew Waters Jul 07 '12 at 10:59

2 Answers2

1

Sounds like you're doing something pathologically wrong with NFS -- like putting tens of thousands of files in one directory or something. NFS performs fine, even on large (TB+) data sets, so it can be done.

Do you, however, need a filesystem? I generally find that you can get much better performance and encapsulation by exposing a more limited set of primitives to your data storage, and operate using those. Rather than go through the whole thing again, I'll just point you to a previous answer I've written that has all the fine detail.

womble
  • 95,029
  • 29
  • 173
  • 228
  • There's nothing unusual about the directories. It's very low volume compared to the applications you seem to have been working on. Thanks for the link to the previous post. It's probably a bit convoluted for what we need - but to interesting to read all the same. – Andrew Waters Jul 07 '12 at 10:55
0

SVN/git check out to the individual nodes. Rsync across the nodes. Samba server mounted by all the nodes. Basically, anything but NFS.

dmourati
  • 24,720
  • 2
  • 40
  • 69
  • Samba over NFS? – 3molo Jul 06 '12 at 16:23
  • we don't store the implementation (uploaded assets etc) into git, just the application so that's not possible. There'd be too much delay rSyncing because of the volume of checks required. Not convinced that samba would be better than NFS... sounds like you have something against NFS? :) – Andrew Waters Jul 07 '12 at 10:53