Mirror with backup of changed files

1

1

we have some largeish storage requirements (genomic data) for which we need to buy some archival space (write once, read rarely, files remain easily accessible, each user should have access to their own "archive" folder). An "easy" and relatively cheap solution is to get a big NAS boxthat fits sixty 6TB disks = 360TB raw space. (e.g. dnuk, 45drives...)

But what if there's a fire/flood/theft? Easy solution: get a second one of the same, set it up in a different building/campus & ensure rapid connection for daily master-to-slave mirroring. Easy software exists for this.

This protects against catastrophe. But it doesn't protect against naive users who accidentally delete their files & want them back a month later.

Is there an easy software solution that would detect when files are changed or deleted and move/copy the old files to a different place? (ideally this would occur on the "slave"; we could buy an additional NAS box for this).

Any ideas? Thanks! Yannick

Yannick Wurm

Posted 2015-06-29T21:36:09.867

Reputation: 111

We use rsync incremental backups with links for this, but it doesn't monitor for or detect changes, it just runs on schedule. Windows has "Previous Versions" via Shadow copies of shared folders. – ssnobody – 2015-06-29T21:53:46.613

Hmm - thanks @ssnobody that seems relevant. Have you used the rsync incremental backups with links approach with data including some TB-scale files and e.g. 50million small files?

– Yannick Wurm – 2015-06-30T10:38:23.070

I have not tried it personally at that scale, we use it on several file servers and a web server where maximum storage is ~6TB and files larger than 4GB are rare. I don't think it'll have problems coping though, hard links are built into the OS and if your system is already handling TB size files, I wouldn't think this would cause you additional trouble. – ssnobody – 2015-06-30T17:52:40.717

Answers

1

If you are on Windows, Bvckup 2 is pretty much exactly what you need.

Incremental propagation of modified files is not a big issue. Any backup/sync software can do it. The trick is to have the rename detection supported. There are two ways to do it - first is to parse the file system log (NTFS has one, for example) to see the actual changes, second is to scan both locations and then run some sort of comparative analysis to understand if any of the deleted files match any of the newly created ones. I don't know a single backup software that works with the journals, so the first option is not really an option. For the second option you will need to have a software that generates backup plans, i.e. it first scans the trees, then analyzes them, then spits out the list of steps to be performed (rather than reconcile the changes immediately at the scan phase, as robocopy /mir does it, for example). The issue here is the size of your backup. Very few backup apps won't choke on few million files, leave alone of 50 mil. I've used the above app successfully with a 6 mil backup and I suspect that it should be able to chomp through your case as well.

Angstrom

Posted 2015-06-29T21:36:09.867

Reputation: 610

thanks @angstrom - unfortunately not on windows... all our stuff is on linux servers – Yannick Wurm – 2015-07-06T12:19:22.443