12

We have a GlusterFS cluster we use for our processing function. We want to get Windows integrated into it, but are having some trouble figuring out how to avoid the single-point-of-failure that is a Samba server serving a GlusterFS volume.

Our file-flow works like this:

GlusterFS Document Flow

  1. Files are read by a Linux processing node.
  2. The files are processed.
  3. Results (can be small, can be quite large) are written back to the GlusterFS volume as they're done.
    • Results can be written to a database instead, or may include several files of various sizes.
  4. The processing node picks up another job off of the queue and GOTO 1.

Gluster is great since it provides a distributed volume, as well as instant replication. Disaster resilience is nice! We like it.

However, as Windows doesn't have a native GlusterFS client we need some way for our Windows-based processing nodes to interact with the file store in a similarly resilient way. The GlusterFS documentation states that the way to provide Windows access is to set up a Samba server on top of a mounted GlusterFS volume. That would lead to a file flow like this:

GlusterFS doc-flow via Winders

That looks like a single-point-of-failure to me.

One option is to cluster Samba, but that appears to be based on unstable code right now and thus out of the running.

So I'm looking for another method.

Some key details about the kinds of data we throw around:

  • Original file-sizes can be anywhere from a few KB to tens of GB.
  • Processed file-sizes can be anywhere from a few KB to a GB or two.
  • Certain processes, such as digging in an archive file like .zip or .tar can cause a LOT of further writes as the contained files are imported into the file-store.
  • File-counts can get into the 10's of millions.

This workload does not work with a "static workunit size" Hadoop setup. Similarly, we've evaluated S3-style object-stores, but found them lacking.

Our application is custom written in Ruby, and we do have a Cygwin environment on the Windows nodes. This may help us.

One option I'm considering is a simple HTTP service on a cluster of servers that have the GlusterFS volume mounted. Since all we're doing with Gluster is essentially GET/PUT operations, that seems easily transferable to an HTTP-based file-transfer method. Put them behind a loadbalancer pair and the Windows nodes can HTTP PUT to their little blue heart's content.

What I don't know is how GlusterFS coherency would be maintained. The HTTP-proxy layer introduces enough latency between when the processing node reports that it is done with the write and when it is actually visible on the GlusterFS volume, that I'm worried about later processing stages attempting to pick up the file won't find it. I'm pretty sure that using the direct-io-mode=enable mount-option will help, but I'm not sure if that is enough. What else should I be doing to improve coherency?

Or should I be pursuing another method entirely?


As Tom pointed out below, NFS is another option. So I ran a test. Since the above mentioned files have client-supplied names that we need to keep, and can come in any language, we do need to preserve the file-names. So I built a directory with these files:

NFS directory with good names, on the server

When I mount it from a Server 2008 R2 system with the NFS Client installed, I get a directory listing like this:

NFS directory with bad names, on the client

Clearly, Unicode is not being preserved. So NFS isn't going to work for me.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • I believe the Samba team considers `ctdb` stable and ready for production use and the first sentence in the link you gave makes the second invalid because if was never updated. I was planning on establishing this, but before I got around to this I switched jobs to a nearly windows-free environment. – Sven Apr 10 '12 at 21:30
  • What version of windows are you looking at using? – Tom O'Connor Apr 11 '12 at 00:31
  • @TomO'Connor As the tag says, Windows 7. Though, Server 2008 R2 will be in there at some point. – sysadmin1138 Apr 11 '12 at 01:29
  • I suppose Cygwin is out of the question? – Tom O'Connor Apr 11 '12 at 20:12

2 Answers2

7

I like GlusterFS. Actually, I adore GlusterFS. As long as you can give it some dedicated bandwidth everything's fine.

One of the best things about GlusterFS is using it with NFS. One of the surprising things I've been working with lately is NFS on Windows 7 and 2k8R2.

Here's what I'd do.

  1. Set up 2 GlusterFS servers that can export NFS.
  2. Set up a heartbeat link between them.
  3. Deploy something like Heartbeat/Pacemaker perhaps?
  4. Set up a virtual IP (VIP) between your Gluster Nodes.
  5. Connect the Windows boxen's mapped network drives using the IP address of the VIP.
  6. Test everything you can possibly imagine.

Clustering Samba sounds scary, and even if you do do that, Samba still lacks the ability to behave reliably in some windows networks (all that NT4 domain compatibility, never seem to be able to get past that).

I think that because each gluster node is in distributed,replicated mode then you should theoretically be able to connect to either and allow it to worry about moving your data around. As a result, the heartbeatd should be the thing that does the redirection and control which one you're talking to.

As for your

  • File-counts can get into the 10's of millions.

I suggest that you investigate using XFS as the underlying file system, as it's pretty good with big filesystems, and supported under GlusterFS

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
  • I'm currently using XFS! We looked at NFS3 a while back to handle the initial ingest function but it proved unworkable due to lack of Unicode support. This was with the NFS server on Windows. "会計2012.xls" would not render correct, and that's very important. But... I did not know that about 7/R2, and is worth investigating! – sysadmin1138 Apr 11 '12 at 11:14
  • So I ran a test. Unfortunately, it didn't return good results (see update on question). The Unicode problem is bi-directional it seems. – sysadmin1138 Apr 11 '12 at 19:14
  • Bugger. I'm out of ideas, then. I wonder if you could put Samba behind a VIP. – Tom O'Connor Apr 11 '12 at 19:36
  • Workgroup yes, Domain (which we're using) no. Thus, my problem. – sysadmin1138 Apr 11 '12 at 20:33
  • On the other hand, after conversing with the developers keeping the file-names is not as critical as I expected. Apparently, so long as we can get them in the very first stage (ingest) the database will keep track of the names. So NFS is a valid option here (once we get the right Windows versions). – sysadmin1138 Apr 11 '12 at 22:46
  • FYI this will currently not work with disk-encryption feature of glusterfs. – Artur Bodera Nov 06 '14 at 17:12
1

Maybe you can think in HA solution... use an LDAP for authentication (it can be replicated as many LDAP servers you want) and place an IP to listen to SMB services.

This IP will be floating on main server. When this is down Heartbeat can start services on second server.

This servers will have a mountpoint to glusterfs, and then all data will be there.

It's a possible solution and it's so easy to manage...

Chetan Bhargava
  • 245
  • 5
  • 15
Saxa
  • 11
  • 1