2

So, basically, I'm looking for linux software to monitor a folder (and it's subfolders) for any changes and apply some form of versioning (that is, keep a "database", or whatever, from where I can restore files).

The reason why this can't be done with traditional SCMs (GIT, SVN, HG, whatever) is twofold:

  • the monitoring and versioning must be automatic (with the only order criteria being time)
  • the software I need is to do a specific thing, unlike SCM, where they do a lot more (and of course reasonably more error-prone)

The server this will be run on is an unmanaged VPS, hence having considerable control - but I'm afraid not enough to install a versioning filesystem partition.

While at it, know that I checked out wayback already, but I'm not impressed and hesitant to use software last updated 7 years ago (2004).

Sorry for passing on the headache to other fellow server-faulters, but I can't help it ;)

Edit: By the way, though I would prefer this to be CLI-based, any alternatives are very welcome as well!

Edit 2: Not to bash linux or anything, but with linux(unix)'s concept of signals, this shouldn't be too difficult to write (by a dedicated team of course).

In fact, I'm using a system which does this already (Dropbox), but it has a different purpose and as such, versioning is limited to 30 days (and versioning is kept in an online storage). But it shows that the concept is entirely possible.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Christian
  • 462
  • 5
  • 22
  • How much data, and how much churn do you expect? – Tobu Mar 01 '11 at 22:21
  • I don't understand your question much, but I do have certain limits, such as a total of 2GB on live files, and since this has to do with code/text files, expects thousands of files. Edit: make it a million or two :) Oh, about churn, there shouldn't be a lot of changes at the same time (like maybe some 5000 at most), but the changes are actually simultaneous. – Christian Mar 01 '11 at 22:22

3 Answers3

1

Does it really have version on change at the file level, or would you accept a periodic snapshots?

If you are willing to accept periodic snapshots then you could simply use something like dirvish or rsync directly. Basically you build a complete copy of your filesystem, and then any further copies will hardlink identical files, and new/changed files will be separate.

Dirvish basically is a front end for rsync and uses the --link-dest option.

How are the users accessing the filesystem? Is access via webdav an option? You could setup SVN with apache, and use the autoversioning.

Zoredache
  • 128,755
  • 40
  • 271
  • 413
  • I would prefer monitoring changes. Periodic snapshots sounds like having to check each file - lots of lost resources. – Christian Mar 01 '11 at 22:18
  • As to users accessing the FS, the files aren't accessed directly from outside, but a web interface will be provided. As such, the web interface might be written to be aware of when a snapshot is being taken (and make the users wait a little). – Christian Mar 01 '11 at 22:26
  • 1
    A snapshot with rsync should be pretty fast. Given the standard options, it wouldn't compare files unless the time or size of the files was different. – Zoredache Mar 01 '11 at 22:32
1

If you don't want to use rsync as Zoredache suggested, my next suggestion would be to write a script which uses inotify to monitor for changes. It wouldn't be very difficult.

Then whether your script then automatically commits the changed file to a traditional version control system (like svn, git, etc) or just keeps the last X versions of the file else where is up to you.

Steven
  • 3,009
  • 18
  • 18
  • That's actually quite doable. I just have to learn to write such a script and make use of inotify :). But hey, some lead is what I was asking for, and I'm happy with this. Thanks. I'd give +1 if you gave my additional (nontraditional) info on achieving this, such as what kind of script (I presume a shell script?) as well as any drawbacks I might encounter. – Christian Mar 01 '11 at 23:06
  • Apparently, it works well with other scripts such as PHP, better and better! – Christian Mar 01 '11 at 23:09
  • Yes, whatever language you're most fluent in is best: bash, perl, php, python, etc. I can't think of any drawbacks off hand other than bugs in your script :-) – Steven Mar 02 '11 at 01:04
0

I know of one method of doing this. It's a proprietary file-system and comes with a bunch of other stuff which in turn makes it very expensive, but it does do most of what you're looking for. It's called NSS and ships with Novell's Open Enterprise Server 2. Unlike Wayback, it's actually still supported. It keeps what it calls a 'salvage' tree, which will keep as much 'deleted' data as there is free space on the volume minus 20% (this can be configured).

The one caveat is that it doesn't track revisions to specific files, only files that are deleted and recreated with new information. So, Excel files will be tracked, but Access databases won't be.

NSS Salvage will track only a certain period of time that depends on how much free space is left on the volume. Pair that with an automated system to pull files out of Salvage and into a more traditional revision-control system and you have a pretty powerful solution. It does mean dealing with Novell, though.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296