17

I have two servers, placed at datacenters in holland and france. Both are running Debian Wheezy. I need to share /home between them, with good performance. There are 300-something users on the servers, around 30 of them should be able to have active processes on a given server at a given time, each having 50 kbit reads and and 20 kbit writes/second, with short peaks around 2000 kbit/s reading. measures with iotop on local storage. I have a lot of small files, around 500000 in total and need as low latency as possible. Ping between servers are 17 ms, and the connection is able to reach around 20-30 MB/s when using scp and wget. It seems there should be plenty of bandwidth available for it too work, but...

What I've trued so far: sshfs: Seemed like it had better performance than nfs, but it ranomly changed permissions of files to root, making the application crash.

nfs: Way to slow, tried noatime an a bunch of other options, but it keeps acting sluggish, even when only a few processes are active.

drbd: 5 hours of dead-end work, when I realized I couldn't actually mount the filesystem on both systems :-(

glusterfs: Having a local copy of all data really sounded promising, but random file access is really slow and after running a while, it becomes unbelievable slow and almost hangs. noatime doesn't help.

nfs again: Still sluggish.

Crying into the keyboard: No improvement at all.

What to try next? Each of the failed trials have taken an evening or maybe more during the last week, and I'd really like the next method to work. And yes, it crucial that the filesystems are shared between both servers.

Thanks for any new ideas on this problem.

user3850506
  • 179
  • 3
  • 6
    "Crying into the keyboard: No improvement at all." OK, that gets a +1 from me. – ceejayoz Dec 18 '14 at 22:24
  • You'll probably want either glusterfs, or ceph. A distributed filesystem. Also, you can mount drbd multiple times, but only one read-write, and its a scary bad idea anyhow. – Sirex Dec 18 '14 at 22:27
  • I tried glusterfs, and while it works great with large files, it becomes really slow when reading or writing small files. It seems to be a common problem with glusterfs, and I havent been able to find a fix for it. I'll look into Ceph. Have you tried it you tried it yourself? – user3850506 Dec 18 '14 at 22:34
  • 3
    Mounting the same block device & filesystem, even RO on a different system is bad juju unless the filesystem driver understands the backing block device can change arbitrarily at any time. The block device could change and completely invalidate the inode cache and the VFS would happily read data that is no longer where you thought. Shared-disk aware filesystems like GFS2 and veritas can do that on DRBD or any SAN-like disk. I can't say for sure your small file performance will be acceptable though. – Andrew Domaszek Dec 19 '14 at 00:17
  • 1
    How about MARS: http://schoebel.github.io/mars/ https://www.youtube.com/watch?v=WwXOQJ6XiVI – Andrew Dec 19 '14 at 01:36
  • Both the GFS2 on top of DRBD and MARS sounds interesting. MARS requires a custom build kernel, which involves some extra work as both servers are headless and while the arguments for MARS are quite compelling, I'm not just going to slap some patches on a kernel and wish for it to boot. So I'll probably try GFS2 first. – user3850506 Dec 19 '14 at 09:12
  • with gfs2 you need to configure a cluster consisting of the two machines, and it being on separate sites there are some issues involved with fencing. I think for sure you will have performance issues with this solution, especially with lots of small files and concurrent users. In my experience, this will for sure be slower than NFS exported non-clustered fs like ext4. – Petter H Dec 20 '14 at 19:37
  • have you tried 'lookupcache=all' for nfs? Performance should increase , but you might run into inconsistencies, like ls might not show a file that is actually there until you touch something in the directory – Jens Timmerman Jan 06 '15 at 13:37

1 Answers1

3

There are some possible solutions for this:

  1. You can go for a replicated block storage like DRBD (or MARS as mentioned above), but you need to setup a cluster file system on top of the block storage. Such file systems could be GFS2 or OCFS2 which are both available in the Debian kernel afaik. DRBD can handle primary/primary and you can mount it on both servers at the same time. But if you do this with a standard file system, one server does not know about the other and you would destroy your file system in a few seconds. A cluster file system on top would handle the communication and locking so that both nodes can write to the same block.

  2. Use a distributed file system for /home. You will find a list of such file systems at http://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems. But beware and choose wisely. They all can't do magic and all have their drawbacks. Gluster is such a file system. For some systems, you might need more than only two nodes.

  3. If it does not have to be replicated in real time and a nearly-realtime file sync would be sufficient, then have a look at BitTorrent Sync (http://www.getsync.com/), Dropbox or alternatives. Each server has it's own /home, but changes get replicated on file basis to the other server.

mgabriel
  • 1,091
  • 8
  • 15