4

At my University Department we are about to upgrade the computers of our student lab (about 25-30 machines). The machines will be running Linux.

One thing about the new machines is that they have huge (1TB) hard disks (we did not ask for them, but anyways these days you cannot find considerably cheaper disks!)

Currently the users home directories are stored on a central file server and mounted via nfs.

So the question is, is there any way we could use all this disk capacity? I would think about

  • expanding our central file store, or
  • replicating the home directories for faster access.

The main issue would be that the lab machines are not guaranteed to be up all the time.

Browsing around this site I read about GlusterFS and AFS.

GlusterFS seems to have many friends and be a nice general purpose solution.

What about AFS? I've read that it has performance problems, any experience with it?

nplatis
  • 141
  • 1
  • Are the machines going to be on all the time and are they server-class, i.e. any form of RAID, dual PSUs, ECC memory etc.? – Chopper3 Mar 18 '13 at 12:09
  • 1
    Probably not a good idea. I mean, sure, you could put a distributed file system on them, but given that they're desktops and may or may not be powered on, and you don't want to impact the users when they're using them... what are you going to store on your shiny new distributed file system? – HopelessN00b Mar 18 '13 at 12:39
  • Wow, what a waste... I really wonder if anyone has a smart idea to use them as such. Swap space, SQL server slaves, backup, distributed FTP server (rsync-ed, at least one host should be up), distributed sniffers (tcpdump/nids), etc. Whatever you are currently doing, use that to make scaling projects. – Aki Mar 18 '13 at 12:59
  • @Chopper3: The machines are good quality desktop PCs (Alienware Aurora, don't ask!). Being in the student lab, they will be on most of the time but of course, that's not guaranteed. What I was thinking mostly would be a way to cache students home directories locally, for faster access, but I don't know if the actual network traffic and performance will be less or more than when simply using NFS. – nplatis Mar 18 '13 at 17:18
  • 1
    @nplatis - there's no good way to do this, they may be 'good quality' but they're still desktops (i.e. they'll break about 10-100 times more frequently than proper servers), and the lack of availability will kill any attempt to use a clusters/distributed file system, I'd give up on the idea, nice that you thought of it but your situation isn't working in your favour sorry. – Chopper3 Mar 18 '13 at 18:33

2 Answers2

6

I've been there, not wanting to "waste" what appears to be good storage. It's not "good", it's a fool's errand trying to use that storage as anything but local. The system would have to keep a full copy of everything on every machine, as it would never know what machine is going to be turned on/off. The replication traffic alone would make a noticeable impact on your network.

If you really want to use those disks, pull them out of the workstations (PXE boot the workstations) and use the disks in a SAN (there are many reasons against using consumer grade disks in a SAN too!)

Chris S
  • 77,337
  • 11
  • 120
  • 212
0

Did you look on CEPH filesystem http://ceph.com/ceph-storage/

Also, about caching, if you really want this, - you can try CacheFS, here is nice article about http://www.c0t0d0s0.org/archives/4727-Less-known-Solaris-Features-CacheFS.html

BVA
  • 101
  • 1