4

newbie question. I need to build this:

  • /shared folder ~500GB of files, ~1MB each one.
  • Two boxes (server1 and server2) connected by a 1Gbs LAN
  • Every box needs to get r/w access to the files, so their are both clients
  • I want that the files replicated on both boxes, every time a file is written in one server the same file should be present in the other one.

My questions regarding GlusterFS:

  • It'll duplicate the files on the same box?. For example the files are on /shared and the mount in /mnt/shared. It'll take 1GB space on every server?
  • Instead, should I use the filesystem directly, locally writing on /shared? Does the replication work in this way without mountin a client?

Also, if anyone know any other way to acomplish this setup I'll be very grateful. Thanks in advance.

k7k0
  • 255
  • 2
  • 7

3 Answers3

8

I've finally managed to get this solved using GlusterFS in both boxes. Some things learned in the process:

  • First I've tried a generic RAID 1 setup. The main problem with this is that the client always use tcp to contact both servers, even when one of them is in the same machine. So I've to change client configurations to replace the tpc 'local' volume with a direct access (storage/posix) volume
  • To avoid stressing the network link, every client read use the local storage with directive option read-subvolume. Off course to keep the RAID1 integrity GlusterFS always check other volumes as well, but the actual file is retrieved directly from disk
  • Performance is good, but client process seems like memory hug. I think is related to quickread volume, I need to investigate further

Modified client configuration:

# Server1 configuration (RAID 1)
volume server2-tcp
    type protocol/client
    option transport-type tcp
    option remote-host server2
    option transport.socket.nodelay on
    option transport.remote-port 6996
    option remote-subvolume brick1
end-volume

volume posix-local
    type storage/posix
    option directory /shared
end-volume

volume locks-local
    type features/posix-locks
    subvolumes posix-local
end-volume

volume brick-local
    type performance/io-threads
    option thread-count 8
    subvolumes locks-local
end-volume

volume mirror-0
    type cluster/replicate
    option read-subvolume brick-local
    subvolumes brick-local server2-tcp
end-volume

.....

Answering my both questions:

It'll duplicate the files on the same box?

No, the fs is mounted using FUSE. Current /etc/fstab line:

/etc/glusterfs/client.vol /mnt/shared glusterfs defaults 0 0

Instead, should I use the filesystem directly, locally writing on /shared? Does the replication work in this way without mountin a client?

No, always use mounted volumes to make read/writes, using directly the filesystem may lead to inconsistencies.

k7k0
  • 255
  • 2
  • 7
5

Actually Gluster is perfect for this scenario. You get bi-directional replication and the ability to mount the filesystem from either machine, giving you (theoretically) twice the effective I/O capacity of NFS and active failover should one of the boxes fail.

The problem with doing active rsync this way is blocking I/O due to file locks. Depending on your application and the change in data this could be irrelevant or disastrous! Distributed filesystems have very specific locking semantics that prevent this from happening. Even if inotify has better locking (when I last tried it it didn't) these days then your file accesses may block, depending on whether your network can cope with the changes. These are all theoretical caveats but worth looking into depending on what your app does.

0

It'd be much easier to setup rsync to do active mirroring, or to just setup a nfs share and have them both pull from the same actual drive.

Chris S
  • 77,337
  • 11
  • 120
  • 212
  • I think GlusterFS is overkill for 2 boxes. GlusterFS is good for large HPC clusters. It's great if you want to do it (plenty of useful resources out there to show you how), but you'd likely be better off going with NFS. – churnd May 30 '10 at 16:45
  • I disagree. Gluster is better than Rsync, any day.. better management, and less general cocking with permissions, locking and so on. Gluster is also closer to instantaneous than rsync.. That said, with 2 nodes, you could use DRBD.. – Tom O'Connor Jun 21 '10 at 20:23
  • While you have valid points, I still think rsync is "easier" for people not familiar with Gluster. – Chris S Jun 21 '10 at 20:53