11

I've long thought about versioning file systems. This is a killer feature and I've looked at Wayback, ext3cow, zfs, fuse solutions, or just cvs/svn/git overlays.

I consider ext3cow the model for my requirements. Transparent, efficient, but I can do without the extra ls abc@timestamp feature. As long as I somehow get automated, transparent versioning of my files.

It could be instantaneous or it could be based on snapshots on intervals of 10s, 30s, 1m, 5m, 15m, etc. Just something that will efficiently deal with thousands of files in a given directory all of various sizes, most small, but some upwards of 100m to 1gb.

ZFS isn't really an option as I'm on linux (and would prefer not to use it through fuse as I already have an ext3 setup I want to version, not something new).

What solutions are out there?

Dale Forester
  • 241
  • 2
  • 5

6 Answers6

7

If you wrap your file systems using LVM, then you can create a snapshot volume using the underlying logical volume layer. It's a pretty simple process and surprisingly effective for standard "snapshotty" things, such as backup and undoing rm -fr oopsies.

McJeff
  • 2,019
  • 13
  • 11
6

After 8 years of searching I found the SVNFS by Marco R. Gazzetta (which is different from older project with the same name by John Madden [which one does different things]). This SVNFS uses svn transparently in r/w operations:

Instead of creating a file system that does its own versioning, I used an existing versioning tool, subversion, and made its use transparent. The advantage is that this file system doesn't require you to learn a new tool, if you know subversion

It's written in Python and uses FUSE:

Now you start the versioning file system by invoking the script attached:

python svnfs.py -o svnroot=/home/marco/svnfiles /home/marco/myfiles

Once everything is fine, you should be able to get a listing of both directories and see that the contents are the same.

Now, if you create (almost) any file in either directory, it will show up on the other side of the fence, as well. The big difference is that if you create a file in the myfiles directory, it will automatically be placed under version control (the opposite is not true).

In the example SVNFS uses separate directory for the repo. Although I haven't tested it. For my needs I'd like to have repository right in my working dir.


I also have found reference to Reiser4's versioning capabilities 4 years ago:

See Reiser 4. Files are directories.

eg: diff -u main.C main.C/r/123

Or to access properties

cat main.C/p/svn-eolstyle

echo "foobar" > main.C/p/my-property 

It seems that it would be best to follow that model, since a major filesystem is already going that route.

-Paul Querna

But I haven't checked it too.


Two years ago I went for searching further, found project FiST for generating stackable file systems and contacted prof. Erez Zadok of Stony Brook University who was adviser / mentor for the project called versionfs long ago. Quoting:

http://www.fsl.cs.sunysb.edu/docs/versionfs-fast04/

http://www.fsl.cs.sunysb.edu/docs/versionfs-msthesis/versionfs.pdf

allows users to manage their own versions easily and efficiently. Versionfs provides this functionality with no more than 4% overhead for typical user-like workloads. Versionfs allows users to select both what versions are kept and how they are stored through retention policies and storage policies, respectively. Users can select the trade-off between space and performance that best meets their individual needs: full copies, compressed copies, or block deltas. Although users can control their versions, the administrator can enforce minimum and maximum values, and provide users sensible defaults.

Additionally, through the use of libversionfs, unmodified applications can examine, manipulate, and recover versions. Users can simply run familiar tools to access previous file versions, rather than requiring users to learn separate commands, or ask the system administrator to remount a file system. Without libversionfs, previous versions are completely hidden from users.

Finally, Versionfs goes beyond the simple copy-on-write employed by past systems: we implement copy-on-change. Though at first we expected that the comparison between old and new pages would be too expensive, we found that the increase in system time is more than offset by the reduced I/O and CPU time associated with writing unchanged blocks. When more expensive storage policies are used (e.g., compression), copy-on-change is even more useful.

It seemed very interesting to me but contacting the guys who worked on the project revealed that threre is no known place of it's source code. Professor himself stated in mail:

Versionfs's code is very old now, and it only worked in kernel 2.4. If you still want a stackable versioning f/s, then one would have to write it from scratch — possibly based on wrapfs (see wrapfs.filesystems.org/).

So there is no working project here though concept of stackable filesystems seems very nice to me. Would anyone like to start project based onf wrapfs, notify me please:)

saulius2
  • 266
  • 3
  • 5
4

You can check gitfs. It's a FUSE filesystem based on git, pretty stable and super easy to use.

Basically, it's an overlay over git. Whenever you update a file or directory it create a commit with that change (knows to batche the commits so you don't end up with 100 commits when you unzip an archive). Also knows to sync your remote and merge the conflicts using 'always accept mine' strategy.

When you mount it, it brings you two directories: current and history. ├── current │   ├── test1.md │   ├── test2.md │   ├── test3.md -> current/test2.md │   ├── test4.md │   └── test_directory └── history ├── 2014-11-23 │   ├── 20-00-21-d71d1579a7 │   │   └── testing.md │   └── 20-42-32-7d09611d83 │   ├── test2.md │   └── testing.md ├── 2014-12-08 │   ├── 16-38-30-6d6e71fe47 │   │   ├── test2.md │   │   └── test1.md

More information can be found on this page.

vtemian
  • 181
  • 1
  • 7
2

bup looks promising.

Older discussion of it here: http://lwn.net/Articles/380983/

  • There's one caveat with using something git based, modifications in git aren't treated as deltas from the origin - every commit is the full file even if you just change one byte. – synthesizer Mar 10 '14 at 18:00
0

Try rsnapshot -- I have not used it myself, but I stumbled upon it while looking @ file-level deduplication systems.

Jason
  • 1,875
  • 1
  • 13
  • 12
  • That's interesting. I will definitely look into it. My worry is that its io load will cause stuttering on my system (I didn't something similar with rsync a while ago and stopped using it because of hitches/stuttering behaviour in other consoles when it ran). – Dale Forester Mar 21 '10 at 00:54
  • I took a look at rsnapshot and I like its idea but it's very, very unfortunate that it requires a duplicate copy of whatever it's snapshotting. Unfortunately, and of necessity, I'm working with drives at their limit and I want to snapshot contents that are quite a bit larger than the free space left. – Dale Forester Mar 21 '10 at 05:03
  • 1
    The difficulty is in your requirements. besides something *like* rsnapshot, or LVM, ext2/ext3 doesn't have a snapshotting facility built in. You point out ext3cow, but you'd have to change the underlying fielsystem. Note that it looks like you can use rsnapshot and store your data on *ANOTHER* machine, I dunno what kind of space your talking about, but it may make sense to keep your snapshots on another machine? Also keep in mind that snapshots of any kind will require disk space. If your drives are near capacity, how much space do you have left for snapshots? – Jason Mar 21 '10 at 12:07
0

Take a look at Hot Copy from R1Soft.

http://www.r1soft.com/tools/linux-hot-copy/

This is a kernel module that provides copy-on-write snapshots for standard systems without using LVM. It's worked fairly well for me and I can install it without a reboot.

Also see: http://www.r1soft.com/tools/linux-hot-copy/hcp-tips/

ewwhite
  • 194,921
  • 91
  • 434
  • 799