Experience with CVS on a clustered filesystem

Question

I would be interested in any experiences using CVS on a clustered file system with mulitple servers accessing it. I guess this is similar to what providers like SourceForge do.

Currently we use a RHEL based CVS server with an ext3 repository filesystem on a SAN.

The idea is to use several machines to handle CVS connections from clients all working on the same file system on a fast SAN. This redundancy could server for both load balancing and failover purposes (using e. g. a round-robin DNS that could be reconfigured if one of the servers were to fail).

SVN is not an alternative for various reasons, please do not start a CVS/SVN discussion.

score 3 · Accepted Answer · answered May 29 '09 at 22:43

The best answer to your VCS scaling issues is the one you gave in your question. Don't use CVS. I do agree with you though, SVN is the solution to no ones problems. There are plenty of highly scalable version control systems out there (Perforce, Rational are examples).

I think in general though you are going to find that clustered filesystems aren't going to provide the performance you are looking for, their main goals are availability. If you need to pick any clustered FS then I think you need to look into something like http://oss.oracle.com/projects/ocfs/ which is built for high performance database clustering. High performance databases, though, don't rely on flock or similar file locking mechanisms as CVS does, it just doesn't scale. You would need to add some sort of transactional distributed lock manager. CVS and high performance just don't fit in the same ballpark.

I do have a feeling though that you aren't trying to scale your source control system and you are trying to use CVS for something application specific. In that case I would suggest coding directly to RCS, and rolling your own lock manager. I would avoid the complication and expensive of distributed or clustered filesystems and concentrate on building a smarter app using some sort of distributed hash bucket approach.

score 0 · Answer 2 · answered May 28 '09 at 00:59

In between your san and the machines running CVS you're going to need some form of networked filesystem (at least, I can't think of any filesystem which copes with concurrent access to the same device, and I'm assuming that by SAN you mean storage presented to the server/OS as a storage device). A few years back there was a discussion on CVS over NFS, and you're going to potentially run into the same/similar kinds of problems with any network filesystems.

You want a networked filesystem which handles locks well
Ideally you also want a networked filesystem which handles filesystem cache coherency between your CVS frontends

Now, I don't know exactly how sourceforge is structured for CVS, however, my guess would be something along the lines of:

A small number boxes that allows CVS commits, this is possibly partitioned in such a way that one project is associated with one box/filesystem where they do their commits.
The state from the CVS commit boxes are then replicated to a large number of boxes/filesystems which they load-balance and handle failovers for anonymous CVS reads, CVS->html browsing, etc.

(The reasoning behind my guesses are that anonymous CVS has on occasion served a CVS state that has been several hours old, and I have a vague recollection of speaking to the sf CVS commit boxes on occasion crawling very slowly).

There are cluster file systems that allow such things (see OCFS, GFS and the like). I know this works in general, but I am looking for experience reports. — Daniel Schneller, May 28 '09 at 15:19

score -2 · Answer 3 · answered May 23 '09 at 00:02

-2

I don't really have an answer, but for the sake of furthering the discussion...

I assume that CVS uses some kind of transactional database as a backing store (I know this is how SVN does it). If that's the case, it seems to me that multiple writers on those file structures wouldn't really be safe. Wouldn't the better approach be to create the abstraction layer at the database interface? For example, use a SQL service instead of the local BDB/LDBM or whatever it may be (assuming CVS supports such a thing).

answered May 23 '09 at 00:02

Adam D'Amico

964
8
9

CVS is a wrapper over a bunch of RCS ,v files - nothing transactional about it... Of course, if you're recovering a repository from a crashed system, that's not necessarily bad news - but it means Daniel can't count on help from CVS for his repo's integrity – Mike G. May 29 '09 at 19:06

Experience with CVS on a clustered filesystem

3 Answers3