10

Like many other places, we ask our users not to save files to their local machines. Instead, we encourage that they be put on a file server so that others (with appropriate permissions) can use them and that the files are backed up properly.

The result of this is that most users have large hard drives that are sitting mainly empty. It's 2010 now. Surely there is a system out there that lets you turn that empty space into a virtual SAN or document library?

What I envision is a client program that is pushed out to users' PCs that coordinates with a central server. The server looks to users just like a normal file server, but instead of keeping entire file contents it merely keeps a record of where those files can be found among various user PCs. It then coordinates with the right clients to serve up file requests. The client software would be able to respond to such requests directly, as well as be smart enough to cache recent files locally. For redundancy the server could make sure files are copied to multiple PCs, perhaps allowing you to define groups in different locations so that an instance of the entire repository lives in each group to protect against a disaster in one building taking down everything else.

Obviously you wouldn't point your database server here, but for simpler things I see several advantages:

  • Files can often be transferred from a nearer (or the local) machine.
  • Distribute network connectivity, rather than crowding all file transfers on a single connection
  • Disk space grows automatically as your company does.
  • Should ultimately be cheaper, as you don't need to keep a separate set of disks

I can see a few downsides as well:

  • Occasional degradation of user pc performance, if the machine has to serve or accept a large file transfer during a busy period.
  • Writes have to be propogated around the network several times (though I suspect this isn't really much of a problem, as reading happens in most places more than writing)
  • Still need a way to send a complete copy of the data offsite occasionally, and this would make it very hard to do differentials

Think of this like a cloud storage system that lives entirely within your corporate LAN and makes use of your existing user equipment.

Our old main file server is due for retirement in about 2 years, and I'm looking into replacing it with a small SAN. Our current file server is using about 400GB of a 1TB share. We've only kept it down that small because our backup space was limited. I'm looking to expand to at least 4TB of usable space when it's replaced, maybe much more if prices come down as much I expect. I'm thinking something like this would be a better fit. As a school, we have a couple computer labs I can leave running that would be perfect for adding a little extra redundancy to such a system.

With very few exceptions, our users are filling less than 40GB of their 120GB hard drives, meaning I could easily reserve 65GB per machine. And that's only going to increase, as newer machines are coming in with 250Gb drives and even those could easily be larger soon. By the time the file server is replaced, given our desktop replacement schedule I'd expect such a system to allow for 5TB of usable storage, even allowing for redundancy and history.

Unfortunately, the closest thing I can find is Dienst, and it's just a paper that dates back to 1994. Am I just using the wrong buzzwords in my searches, or does this really not exist? If not, is there a big downside that I'm missing?

Joel Coel
  • 12,910
  • 13
  • 61
  • 99
  • 1
    Instead of buying large magnetic drives, you should consider smaller solid state drives instead for your end user systems. – jftuga Apr 22 '11 at 13:21
  • You might want to look into [CCNx](http://www.ccnx.org/) developed by a team around van Jacobson at the PARC. Related: the [SWIFT](http://libswift.org/) project – the-wabbit Jul 04 '13 at 17:53

5 Answers5

4

It sounds to me like you're describing AFS, the most common implementation of which is OpenAFS. The key OpenAFS concepts are described here: http://docs.openafs.org/UserGuide/ch01.html#HDRWQ3.

AFS is:

  • Distributed. Filesystem multiple machines, but still using a unified namespace so the distributed nature is transparent to the client machine.
  • Redundant. Files can exist on multiple server nodes at once so the loss of several server nodes does not result in inaccessibility of any data.
  • Scalable. Apparently some "Enterprise" implementations span as many as 25,000 nodes.
Insyte
  • 9,314
  • 2
  • 27
  • 45
  • +1 probably wouldn't fit his specification though. – Warner Apr 07 '10 at 19:26
  • What seems to be missing? I thought it would be a perfect fit. Or at least a 90% fit. – Insyte Apr 07 '10 at 19:36
  • Well, the fact that running an AFS Server on Windows is unsupported might have something to do with it. http://docs.openafs.org/ReleaseNotesWindows/ch03s11.html – mfinni Apr 07 '10 at 20:55
  • Huh. That doesn't jive with what they say here: http://www.openafs.org/windows.html. – Insyte Apr 07 '10 at 22:02
  • 1
    that's for the *client* software. You want a Windows box to connect to an existing AFS share? Go for it. You want to host an AFS share on Windows? Unsupported. – mfinni Apr 08 '10 at 03:38
2

Yeah, the large disks in end-user desktop systems are tragically going un-used when you're properly using centralized storage. Oh well. Some downsides of using a hypothetical desktop-network-distributed NAS:

  1. It would have to handle degradation caused by user machines going off-line. Someone didn't come in today and their machine is off? Better hope that the documents on there are distributed onto machine(s) that are turned on. Someone is working late tonight and their machine is the only one that's on? Tough luck, sorry. Unless you also have everything copied to a real fileserver - and then, what did you gain?

  2. Everything would have to have good encryption - otherwise, the boss's documents that contain his plan to cash out, or the HR doc that shows everyone's salary, are replicated to Jimmy the mail-clerk's machine. On which he runs LimeWire. See where this is going?

mfinni
  • 35,711
  • 3
  • 50
  • 86
  • 3
    In addition: their perofrmance suck, the network will not be top speed, the SAN goes down in the evening hours, which sucks for maintenance runs. Rather get rid of the discs and boot from SAN ;) – TomTom Oct 29 '10 at 12:31
1

Something like CleverSafe (has both open source and commercial versions) can mostly do what you want, but managing very unreliable nodes might be a problem. CleverSafe handles multiple node outages, but perhaps not quickly enough for the sort of "constant churn" of nodes you would see using desktops as the storage nodes.

I think there are similar solutions from academic papers I've read in the past, but CleverSafe seems to be a real working product and not just a prototype. The company has been around since 2004.

rmalayter
  • 3,744
  • 19
  • 27
1

SANsymphony 7.0 Storage Virtualization Software

below is all quoted from their website:

Main Features

Device-independent virtual disk pooling, synchronous mirroring (HA), high-speed caching, asynchronous remote replication, thin provisioning, auto-tiering, online snapshots, non-disruptive disk migration, continuous data protection (CDP)

Access Type

Block disk I/O over a physical or virtual SAN. File system access is provided via NFS/CIFS protocols from the underlying Windows Server operating system. The two access methods may be combined to meet high availability, unified storage (SAN/NAS) requirements.

Host Environments Supported

Computer systems running standard Windows operating systems including (Windows Server 2000, 2003, 2008, Hyper-V, Windows XP, Windows 7), UNIX, HP-UX, Sun Solaris, IBM AIX, RedHat Linus, Suse Linux, Apple MacOs, VMware ESX / vSphere, Citrix XenServer,

Disks Supported (back-end)

Any internal drives, external drives, external disk arrays, JBODs, Solid State Disks (SSD), and intelligent storage system supported on Windows Server 2008 may be attached to the DataCore node(s). They may be direct-attached or SAN-connected.


It's what you're after, yes?

Mark Lawrence
  • 813
  • 5
  • 7
  • Not exactly. This software still takes over each machine - you have to have machines 100% dedicated to storage. I'd like to see something that runs in the background on PCs that are still deployed with end users, and takes advantage of the spare disk that's already out there. – Joel Coel Nov 11 '11 at 20:48
  • Re-reading the storage/metadata abstraction, I saw this recently, but it may be a conceptual fit rather than a practical one: http://www.xtreemfs.org/feature_replication.php – Mark Lawrence Nov 11 '11 at 21:45
  • Of course, the other thing worth looking at if you haven't already is Windows DFS-R. With this, you set up a namespace of folders which is accessed and replicated transparently through AD replication. The namespace folders have targets on physical shares, and these can be replicated using the topology of your choice (e.g. hub and spoke, or free-for-all). Uh oh there's a catch, I think the targets have to be running windows server. – Mark Lawrence Nov 11 '11 at 21:50
0

The closest thing that come to my mind is Googles MapReduce or the free Hadoop alternative, but this is designed to scale into the petabyte area for really big web apps.

Generally, your scenario isn't something I would really like to try out as I guess that the adminstrative overhead to manage the machines for this added service will by far exceed any benefit you might get from it.

Also, I would see a certain risk that there might be undiscovered problems in Windows (or any other OS) that might lead to a vulnerability for a fast-spreading worm that might take out your whole network in a matter of minutes, regardless of how good and secure your systems are configured, and which will take all of your data storage pools with it.

Beside that, I am not really sure that free disk capacities on client systems will continue to grow, as I guess that many more applications will be ported to the web in the future, including stuff like office applicatons and even Photoshop, which will lead to a big push for thin clients (again).

Sven
  • 97,248
  • 13
  • 177
  • 225