How can I utilise unused space on a large network in a decentralised yet resilient manner?

3

3

I administrate a large windows network and I would like to make use of the unused space on the workstations, is there a free way to do this?

Let's say we have:

  • 200 Windows workstations each with 500GB hard-drives, the consumed space on the drives is always less than 50GB.
  • The users of this network store their files on dedicated file-servers, no user-data is stored locally on the workstations.
  • 200 x hard-drives, each with 450GB free = 87.89 Terabytes of unused disk-space distributed across the network.

I am looking for a way to make use of this idle disk space in a reliable manner, choosing data-integrity first over speed of access.

I have heard before of distributed storage on the Internet using P2P like networks, where users choose to delegate a certain percentage of their disk-space to the network in exchange for them to store data on other drives around the world, in a distrubuted fashion whereby data is duplicated to still provide access if some clients go offline.

Obviously doing that anonymously on the Internet causes other potential issues but I am looking to do that kind of thing on a local network in a controlled environment, not for everyday user use but more archival, long-term storage.

Almost like a Distributed File System, self-managing, encrypted, data replication for redundancy when workstations go offline.

A Windows based service would probably best suit my needs, running silently in the background, able to be set to a low priority in terms of load on the workstation. Obviously the data-store should be encrypted, perhaps even P2P in nature so other clients work together to stream data to replicate?

If anyone knows of software that can achieve this please do enlighten me, if it's free then all the better! Thanks for your time & help.

user281618

Posted 2013-12-13T18:44:48.287

Reputation: 31

Answers

2

Tahoe-LAFS. It's written in Python and is cross-platform. No Windows installer yet, but it does work on Windows. You need to build it (running a c:\python27\python.exe c:\your_tahoe_unzip_path\tahoe build command before first use), but then you can copy the built files anywhere.

On Windows, you want it running as a service. I've used nssm for this task.

You need to have an introducer node running that is reachable by each machine.

You then need to set up each machine you want to participate in the "grid" as a storage node (using the introducer.furl from the introducer node). Here you specify your redundancy parameters, i.e. how many machines you want it to try to "split" files across.

You can then go to the web interface on any machine running Tahoe and download/upload files to the entire grid. Tahoe can be configured to upload redundantly so if machines go down or unavailable it can try to pull from others. You need to refresh files periodically to make sure they are "ok on the grid."

Tahoe has a built-in "capability" security model where the URL is the access key of the file, and also determines the privileges of what you can do with the file (when you upload a file, you can give a "readonly" URL, basically.) You can also disable the web interface and use the "SSH" frontend (using WinSCP to get and store files).

It's involved and weird, and takes some time to get your head around some of the concepts, but works great.

LawrenceC

Posted 2013-12-13T18:44:48.287

Reputation: 63 487

Oh and it encrypts what it stores automatically. Only with the correct URL can you decrypt the data. – LawrenceC – 2013-12-13T19:20:05.297

Thank you! This sounds great and just the sort of thing I was looking for but unfortunately I followed your python build instructions on Win7 and it didn't work, I've also followed Tahoe's quick & advanced Windows build instructions to the letter but it still fails to produce a setup executable. – user281618 – 2013-12-13T20:23:43.317

It won't produce a setup executable but a Python script called tahoe (in the bin directory of wherever you unzipped tahoe) that you then invoke with Python (from command line). – LawrenceC – 2013-12-13T23:06:26.650

Got it working! - Have a "introducer", client & storage server up and running on one PC (just as test) I'm not going to try deploying to 3 or 4 PCs. Thanks a lot this is looking great. – user281618 – 2013-12-13T23:52:43.420

Any idea how you might go about mass deploying this? I know how to distribute the client tahoe files, but am I ok to just copy the tahoe dir? Do you think the client directory would be pretty generic or does it have it's own unique client-codes? I'll soon find out but any advice on mass client deployment on my own LAN would be greatfully appreciated. – user281618 – 2013-12-13T23:54:43.417

When you run python.exe {tahoe-folder\}tahoe create XXX it basically "sets up" the client. You then need to copy the introducer.furl to the right location. The tahoe.cfg once made can be copied - there is node-dependent data but it's all in separate files that are created with that create command. So you can probably write a script that runs the Python installer from a network location, puts all the files in the right spot, executes tahoe create XXX (you probably want XXX to include part of the hostname) and whatever else. Something I'm still working on myself, tbh. – LawrenceC – 2013-12-14T00:09:08.783