Digest based filesystem?

2

1

If a cloud backup provider (for example) is storing laptop backups, some files will be identical (Windows or Mac OS system libraries, for example).

The best way to store these would be to have a single copy for all of them. So, when a file is uploaded, a digest of the file could be computed and then matched to a stored copy matching that digest.

is there any product out there to do specifically this? (I know key-value stores can achieve it, but they were not designed or optimized for it)

Thanks for any feedback!

Bruno Antunes

Posted 2011-10-17T23:01:29.970

Reputation: 254

Many products do this internally, including Dropbox. They don't usually advertise this because it's invisible to the user. – David Schwartz – 2011-10-17T23:04:22.057

Answers

2

What you're looking for is a deduplicating file system or single instance storage. Apparently some server versions of windows support it.

Alternatively you can look at sdfs/opendedup, which does file level deduplication and ZFS which does block level deduplication.

Journeyman Geek

Posted 2011-10-17T23:01:29.970

Reputation: 119 122

btrfs also has deduplication patches that will be merged in to the core in the future – Paul – 2011-10-17T23:16:56.603

2

Plan 9 Venti/Fossil implemented this (long ago), here's some more information

nos

Posted 2011-10-17T23:01:29.970

Reputation: 3 699