Automatic versioning upon file change (modify/create/delete)

16

3

I am looking for an implementation (on Linux) of a mechanism which would automatically and transparently version any changes in a directory (recursively). This is intended to be an addition (possibly replacement if all the requested features are available) to standard versioning (SVN, git, ...)

A product on MS Windows which does this is AutoVer (to have a better idea of the requirements). I would love to have something like that but aimed at Linux in a non-graphical environment.

I saw that there are some attempts to have this functionnality on Linux, the closest one I found is autoversionning on Subversion but it is not obvious to implement on existing environments (servers where, for instance, configuration files are local).

Maybe something working with inotify?

Thank you in advance for any pointers! WoJ

WoJ

Posted 2011-12-15T09:29:53.063

Reputation: 1 580

related: flashbake

– Dan D. – 2011-12-16T10:06:31.870

Is there a special requirement about which software you use? Because if you're only looking to track changes you do manually (by editing files), Eclipse has this feature built-in, it's called "local history". – Stefan Seidel – 2013-02-06T10:48:00.433

@StefanSeidel I'm not the topic-starter, but I would prefer no-IDE solution. – Michael Pankov – 2013-02-06T12:57:00.393

Answers

6

1. General purpose method using bazaar & inotify

This is untested by me but I found this write up that makes use of bzr (bazaar) & inotifywait to monitor a directory and version control the files in it using bazaar.

This script does all the work of watching the directory for changes:

#!/bin/bash

# go to checkout repository folder you want to watch
cd path/to/www/parent/www
# start watching the directory for changes recusively, ignoring .bzr dir
# comment is made out of dir/filename
# no output is shown from this, but wrinting a filename instead of /dev/null 
# would allow logging
inotifywait –exclude \.bzr -r -q -m -e CLOSE_WRITE \
    –format=”bzr commit -m ‘autocommit for %w/%f’” ./ | \
    sh  2>/dev/null 1>&2 &
# disown the pid, so the inotify thread will get free from parent process
# and will not be terminated with it
PID=`ps aux | grep inotify | grep CLOSE_WRITE | grep -v grep | awk ‘{print $2}’`
disown $PID

# this is for new files, not modifications, optional
inotifywait –exclude \.bzr -r -q -m -e CREATE \
    –format=”bzr add *; bzr commit -m ‘new file added %w/%f’” ./ | \
    sh  2>/dev/null 1>&2 &
PID=`ps aux | grep inotify | grep CREATE | grep -v grep | awk ‘{print $2}’`
disown $PID

exit 0;

2. Managing /etc

For the special case of managing your system's /etc directory, you can use the app etckeeper.

etckeeper is a collection of tools to let /etc be stored in a git, mercurial, darcs, or bzr repository. It hooks into apt (and other package managers including yum and pacman-g2) to automatically commit changes made to /etc during package upgrades. It tracks file metadata that revison control systems do not normally support, but that is important for /etc, such as the permissions of /etc/shadow. It's quite modular and configurable, while also being simple to use if you understand the basics of working with revision control.

Here's a good tutorial to get you started with it.

3. Using git and incron

This technique makes use of git and incron. For this method you need to do the following:

A. Make a repo

% mkdir $HOME/git
% cd $HOME/git
% git init

B. Create a $HOME/bin/git-autocommit script

#!/bin/bash

REP_DIR="$HOME/git"       # repository directory
NOTIFY_DIR="$HOME/srv"    # directory to version

cd $REP_DIR
GIT_WORK_TREE=$NOTIFY_DIR /usr/bin/git add .
GIT_WORK_TREE=$NOTIFY_DIR /usr/bin/git commit -a -m "auto"

C. Add an entry to incrontab

% sudo incrontab -e $HOME/srv IN_MODIFY,IN_CREATE,IN_MOVED_FROM,IN_MOVED_TO $HOME/bin/git-autocommit

4. Using Flashbake

Another option is to use a tool like Flashbake. Flashbake is the version control system that Cory Doctorow (of BoingBoing fame) uses to write his books.

Flashbake uses git under the hood to track changes but is somewhere between doing automated backups and using a plain version control system yourself.

Cory wanted the version to carry prompts, snapshots of where he was at the time an automated commit occurred and what he was thinking. I quickly sketched out a Python script to pull the contextual information he wanted and started hacking together a shell script to drive git, using the Python script’s output for the commit comment when a cron job invoked the shell wrapper.

Resources

slm

Posted 2011-12-15T09:29:53.063

Reputation: 7 449

3

inotifywait+"git local"=gitwatch.sh, look at here: https://github.com/nevik/gitwatch/blob/master/gitwatch.sh

– diyism – 2015-08-12T04:57:35.553

4

Immediatly ZFS comes to mind. It can create snapshots - and there are some projects to automatically create snapshots.

bdecaf

Posted 2011-12-15T09:29:53.063

Reputation: 458

I read about ZFS but it looks like it is not a stable solution for basic filesystems (at least in Linux) – WoJ – 2011-12-15T20:38:52.363

I would really like a solution to snap onto existing FS. – Michael Pankov – 2013-02-06T12:52:23.827

Perhaps this? http://www.ext3cow.com

– Zac B – 2013-02-06T14:36:14.667

3

I think you're on the right track with inotify. This article details its basic usage in a case similar to yours. I'd suggest using it either directly, or compiling a kernel-level utility like fschange. This is something of a hassle, but you could then bind the detection of changes to a git commit or similar.

Those solutions both have the issue of relying on somewhat imperfect third-party solutions. If you don't mind getting your hands dirty, NodeJS provides an excellent, cross-platform facility (fs.watch) for this exact purpose. A basic tutorial on watching files for changes in NodeJS can be found here. In a few dozen lines or less, you could write something that watches a directory for files, and then shells out (via child_process) and runs a git commit or similar (or even manually increments a version file index, if you like the roll-your-own approach).

fs.watch is backed by inotify on linux, but is a lot more intuitive to use. There are other NodeJS projects that wrap that file-watching functionality in various levels of convenience, like this one or this one.

Zac B

Posted 2011-12-15T09:29:53.063

Reputation: 2 653

Still not a ready solution, and, well, I'd probably go with Python's inotify. But thanks. – Michael Pankov – 2013-02-06T12:55:34.887

3

inotify(2) on Linux won't be able to watch a large tree, but fuse filesystem (mounted in separate location) probably could handle it, by translating filesystem requests to svn or git calls, or by changing svn/git metadata directly.

This is a very interesting idea, but I hadn't heard of any existing implementations.

Mikhail Kupchik

Posted 2011-12-15T09:29:53.063

Reputation: 2 381

Let's say I have only a couple of files. – Michael Pankov – 2013-02-06T12:52:45.513

0

There is also a "poor man's" way of doing this using only rsync and a cron job. You basically rely on rsync's backup facility and use two separate paths plus a prefix/suffix to keep track of your files.

It more of less looks like this: /usr/bin/rsync -a -A -X --backup --suffix=date +".%Y-%m-%d_%H-%M-%S" $source_path $backup_path

End result: Changing a file called test_rsync in the source path after the initial execution will result in a file called test_rsync.2017-02-09_11-00-01 being created in the backup path.

There are a bunch of issues with this (it works if you have a decent amount of files only and will fail for changes that happen between two consecutive runs of rsync (1 minute in my case)) but it may be enough for your needs.

If we're talking here about samba shares an exclusion list might be in order , haven't got around to that yet I'm afraid.

Let me know if you improve this.

Florin COJOCARU

Posted 2011-12-15T09:29:53.063

Reputation: 1

0

Here is a Python3 script that does VMS like auto file versioning using a time stamp appended to the original file name when saved.

I've put a bunch of commentary into the script and run a half dozen such scripts on my ubuntu machine with only the directories being different in each different version of the script so that I am simultaneously versioning multiple directories. No real penalty to the machines performance.

!/usr/bin/env python3

print ("PROJECT FILES VERSIONING STARTED") print ("version_creation.py") #place all this code into script of this name print ("run as.. 'python3 version_creation.py' from command line") print ("ctrl 'c' to stop") print (" ") print ("To run program in background type below to command line and then close the window. ") print ("nohup python3 version_creation.py") print ("....to stop process go menu/administration/system monitor... and kill python3") print (" ") print ("Always save files to the 'ProjectFiles' directory and the version files ") print (" will also be created in that directory.") print (" ") print (" ") print (" ") print (" ")

import shutil import os import time

--- set the time interval to check for new files (in seconds) below

- this interval should be smaller than the interval new files appear!

t = 10

--- set the source directory (dr1) and target directory (dr2)

dr1 = "/path/to/source_directory"

dr2 = "/path/to/target_directory"

import glob import os

dr1 = "/home/michael/ProjectFiles" #both originals and versions will be saved to this directory

dr2 = "/home/michael/ProjectFileVersions"

while True:

if os.listdir(dr1) == []:

print ("Empty")

    n = 100
else:
    list_of_files = glob.glob(dr1+'/*')   # * means all if need specific format then *.csv
    latest_file_path = max(list_of_files, key=os.path.getctime)

print ("1 Latest_file_path = ", latest_file_path)

    originalname = latest_file_path.split('/')[-1]

print ("2 originalname = ", originalname)

    filecreation = (os.path.getmtime(latest_file_path))

print ("filecreation = ",filecreation)

    now = time.time()
    fivesec_ago = now - 5 # Number of seconds

print ("fivesec_ago = ", fivesec_ago)

    timedif = fivesec_ago - filecreation #time between file creation

print ("timedif = ", timedif)

    if timedif <= 5: #if file created less than 5 seconds ago

        nameroot = originalname.split(".")[-0]
        print ("3 nameroot= ", nameroot)

        extension = os.path.splitext(originalname)[1][1:]
        print ("4 extension = ", extension)

        curdatetime = time.strftime('%Y%m%d-%H%M%S')
        print ("5 curdatetime = ", curdatetime)

        newassembledname = (nameroot + "_" + curdatetime + "." + extension)
        print ("6 newassembledname = ", newassembledname)



        source = dr1+"/"+originalname
        print ("7 source = ", source)

        target = dr1+"/"+newassembledname
        print ("8 target = ", target)

        shutil.copy(source, target)


    time.sleep(t)

share

the below was put in earlier and works but i like the above python script much better......(been using python for about 3 hours)

#!/usr/bin/env python3

print ("PROJECT FILES VERSIONING STARTED")
print ("projectfileversioning.py")
print ("run as..  'python3 projectfileversioning.py'       from command line")
print ("ctrl 'c'      to stop")
print (" ")
print ("To run program in background type below to command line and then close the window. ")
print ("nohup python3 projectfileversioning.py")
print ("....to stop process go menu/administration/system monitor... and kill python")
print (" ")
print ("Always save files to the 'ProjectFiles' directory and the file ")
print ("   will be redirected to the ProjectFileVersions where")
print ("   time stamped versions will also be created.")
print (" ")
print ("If you like you may then copy/move the versioned and original file from 'ProjectFileVersions' to ")
print ("any other directory you like.")

import shutil
import os
import time

#--- set the time interval to check for new files (in seconds) below 
#-   this interval should be smaller than the interval new files appear!
t = 10

#--- set the source directory (dr1) and target directory (dr2)
#dr1 = "/path/to/source_directory"
#dr2 = "/path/to/target_directory"

import glob
import os

dr1 = "/home/michael/ProjectFiles"
dr2 = "/home/michael/ProjectFileVersions"


while True:

    if os.listdir(dr1) == []:
        n = 100
    else:
        list_of_files = glob.glob(dr1+'/*')   # * means all if need specific format then *.csv
        latest_file_path = max(list_of_files, key=os.path.getctime)
        print ("1 Latest_file_path = ", latest_file_path)

        originalname = latest_file_path.split('/')[-1]
        print ("2 originalname = ", originalname)

        nameroot = originalname.split(".")[-0]
        print ("3 nameroot= ", nameroot)

        extension = os.path.splitext(originalname)[1][1:]
        print ("4 extension = ", extension)

        curdatetime = time.strftime('%Y%m%d-%H%M%S')
        print ("5 curdatetime = ", curdatetime)

        newassembledname = (nameroot + "_" + curdatetime + "." + extension)
        print ("6 newassembledname = ", newassembledname)




        source = dr1+"/"+originalname
        print ("7 source = ", source)

        target = dr2+"/"+originalname
        print ("8 target = ", target)

        shutil.copy(source, target)



        source = dr1+"/"+originalname
        print ("9 source = ", source)

        target = dr2+"/"+newassembledname
        print ("10 target = ", target)

        shutil.move(source, target)
        time.sleep(t)


#share

michael

Posted 2011-12-15T09:29:53.063

Reputation: 1

0

Such a script is not hard to write.

My favourite version control is git.

following script should do it:

#!/bin/sh
git add .
git commit -am "my automatic commit"

either have that periodically check your directory - or if your editor is scriptable call after you save.

But if you do it like this it might make sense to exclude large files and maybe some "useless" like autosaves.

bdecaf

Posted 2011-12-15T09:29:53.063

Reputation: 458

Yes I know that a cron-based solution is simple to implement. I am however looking for something which would version on save, no matter the save mechanism. This is also why I mentioned autoversionninf on svn as well as inotify in my question. – WoJ – 2011-12-16T13:03:54.807

0

SparkleShare (http://sparkleshare.org) is based on git and implements a Dropbox-Like functionality with version control, but you have to set up a ssh-server (can be localhost).

FSMaxB

Posted 2011-12-15T09:29:53.063

Reputation: 1 528

This thing is clumsy and requires a lot of setup. Besides, Dropbox functionality is unneeded. – Michael Pankov – 2013-02-06T12:54:08.460

0

I'd recommend you to try NILFS. Refer the about page and you will be quicky able to decide whther is this the one what you are looking for or not.

HTH

Nehal Dattani

Posted 2011-12-15T09:29:53.063

Reputation: 101