I work in a lab that has to support some fairly processor-intensive user applications but basically has no need for local disk storage because we don't guarantee any kind of data persistence. However, being a Mac shop, we still buy Mac Pros with standard storage configurations. Given that management policy is to continue buying way more disk capacity than we use, is there any way to build some kind of distributed file store on those disks?
Ideally, it be used to store user home directories as such, but, as we currently have upwards of 15TB going entirely to waste, we'd be happy to settle for a more latency-tolerant application like, say, storing tarballs of home directories to be downloaded and extracted by a login hook, or even archival of server backups.
Requirements:
- client (data user), node (data keeper), and any possible server (coordinator?) software all run on Mac OS 10.5 and higher
- highly fault-tolerant: the "nodes" are also user workstations that might get rebooted at any moment; staff, of course, would take any necessary steps before taking a machine down for longer maintenance or retirement
- runs on commodity hardware: fairly high-end commodity hardware but still commodity hardware, no FibreChannel or SCSI
Bonus:
- posix-compliant: it'd be great if, unlike Hadoop, it would present itself as a regular NFS mount or something
At the moment, MogileFS looks like the best candidate, with Hadoop beating it for future support. I've also read of Gluster, but I don't know what sets it apart from the competition. Any advice would be appreciated. I realize running user workstations as storage nodes while users are working on them is a very tall order.
I would also appreciate if anyone could tell me what application is called, since Wikipedia claims that "distributed filesystem" actually refers to things like NFS and BitTorrent (?!).
related: Distributed, Parallel, Fault-tolerant File System, Which Distributed File System as a backend for Cloud Computing?