3

I'm trying to setup a backup script on Ubuntu. Every day I want to copy my local source directory to a backup directory on a remote server uniquely named with the date. (e.g., backup-jan1/, backup-jan2/, etc) It should store a mirror of the earliest state and use difference files to recreate the new backup points.

This is pretty simple with rsync. I've already setup a script that will make the backup, name the backup directory with the current day, and make a symlink to the most recent backup (IP has been edited):

date=`date "+%m%d"`
rsync -ave ssh /srv root@150.69.32.8:/backup/backup-$date/
ssh root@150.69.32.8 rm -rf /backup/current
ssh root@150.69.32.8 ln -s backup-$date/ /backup/current

However, here's the tricky part: I don't want it to copy files that have not changed. So, if any files have changed since the last daily backup it will copy them, like normal. Otherwise, it will symlink unchanged, previously backed-up files from their first backup directory to the new backup. (Kind of like git)

So, for example, let's say I start the backup Jan 1. The backup-jan1/ directory will contain all the original backup files. The next day the Jan 2 backup should then copy just the files changed in that 24 hours. For all other files, it will make symlinks from the Jan 1 backup files. On Jan 3, I added a file and delete another. If a file is removed, it should not continue be symlinked.

Example directory/file structure:

backup-jan1/ (initial backup)
    file_a
    file_b

backup-jan2/ (no changes)
    file_a (symlink to ../backup-jan1/file_a)
    file_b (symlink to ../backup-jan1/file_b)

backup-jan3/ (removed file_a symlink and added file_c)
    file_b (symlink to ../backup-jan1/file_b)
    file_c

...

I've tried to look for this version-control type functionality in rsync and rsnapshot, but I haven't found it yet. Can anyone suggest a backup strategy like this?

mr_schlomo
  • 201
  • 1
  • 4
  • 8

2 Answers2

3

What you seem to be looking for is the --link-dest functionality that is part of rsync. What you seem to describe is exactly how dirvish operates.

The link-dest option creates hard-links from the destination path to another copy of the structure.

With dirvish you perform an initial backup, which just uses rsync.

After that each additional back, is hard-linked to the previous successful backup. Meaning there is no duplication of files. You can directly access any single backup from within the vault, and each backup is a complete and full backup. You can remove previous backups at any time.

Here is a script that you can use to demonstrate.

# create test area
mkdir -p /tmp/backuptest/{source,dest1,dest2,dest3}
for a in `seq 10` ; do dd if=/dev/urandom of=/tmp/backuptest/source/file$a bs=1M count=1; done

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# initial backup
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest1/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# make chagnes
rm /tmp/backuptest/source/file[2-4]
cat /tmp/backuptest/source/file[6-7] >/tmp/backuptest/source/file11

# new backup linked to previous
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest2/ --link-dest=/tmp/backuptest/dest1/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# make changes
rm /tmp/backuptest/source/file5
cat /tmp/backuptest/source/file[5-7] >/tmp/backuptest/source/file12

# new backup linked to previous
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest3/ --link-dest=/tmp/backuptest/dest2/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# remove dest1
rm -r /tmp/backuptest/dest1/

# see your dest2, and dest3 are still complete backups for the state at those times.
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/
Zoredache
  • 128,755
  • 40
  • 271
  • 413
  • Dirvish is exactly what I was looking for. I also found Duplicity (http://duplicity.nongnu.org/)... it also provides encryption. Any thoughts on it? – mr_schlomo Jan 15 '12 at 00:05
  • @mr_schlomo There is also [rdiff-backup](http://www.nongnu.org/rdiff-backup/) which does not use linkage but diff files. Nice too, especially if you use large files with small periodic changes. – the-wabbit Jan 15 '12 at 00:32
1

You can use "cp" with the "-l" option to copy as hard links. If all your source and destination are on the same filesystem, this will be very fast.

So, your original directory is "backup-jan1" and your next one is "backup-jan2". In this case, do:

cp -al backup-jan1 backup-jan2

Then run your rsync against backup-jan2. When rsync encounters a changed file, it will unlink from the original file in backup-jan1 and create a new file (with the same name) in backup-jan2.

The next day, you will do:

cp -al backup-jan2 backup-jan3

with your rsync to backup-jan3. Again, new files in the backup will cause unlinking in backup-jan3, etc.

In this case, if you have 3 files in backup-jan1, with file1 remaining the same across all three days, file2 changing on only on jan2, and file3 changing on each day, then you will have file1 as the same hard linked across all three directories, file2 as two files (one in backup-jan1, and one in backup-jan2 and backup-jan3 which are hard linked), and file3 as three files in each directory.

cjc
  • 24,533
  • 2
  • 49
  • 69