I have a filesystem that is changed on two servers and also needs to be replicated to Amazon S3.
Until recently, syncing the filesystem between the two servers using Unison, and then copying to S3 with s3sync.rb has been a fine solution.
Now that the filesystem is nearly 50GB, s3sync.rb has become the bottleneck, as it needs to check each file for freshness (we use the --no-md5 flag).
So I now have a script that expects a list of files, and it would update these and only these using s3cmd.rb
I'd expected that I could use the unison.log file to get a canonical list of files to pass, but the format of it varies depending on the operation that occurred to a file (new file, copy from local alternative, rename etc).
Is unison able to generate a log or list of files that have been changed other than that left in unison.log?
At the moment this is how I'm extracting the list of file from the unison.log (I'm deliberately ignoring deletes)
# Ignore deletes and get the list of new & changed files
grep -v '\[END\] Deleting ' /tmp/unison.log | grep '\[END\]' $unisonlog | sed -re 's/\[END\] (Copying|Updating file) //' > /tmp/changed-files.log
# Files that unison lists as shortcuts are harder as it doesn't always prefix them with their full path
# so before adding them to the log, find the files in the relevant directory
grep 'Shortcut: copying ' /tmp/unison.log | sed -re 's/Shortcut: copying (.*)+ from local file.*/\1/' | while read file
do
echo "Having to look for $file in source directory"
find /ebs/src -wholename "*$file" >> /tmp/changed-files.log
done