NetApp provides block-level deduplication (ASIS). Do you know any filesystem (even FUSE-based) on Linux (or OpenSolaris, *BSD) that provides the same functionnality ?
(I'm not interested in false deduplication like hardlinks).
NetApp provides block-level deduplication (ASIS). Do you know any filesystem (even FUSE-based) on Linux (or OpenSolaris, *BSD) that provides the same functionnality ?
(I'm not interested in false deduplication like hardlinks).
Deduplication is coming to ZFS on OpenSolaris but that functionality is not currently available.
It was prototyped by Jeff Bonwick and Bill Moore this past winter and they are working on integrating it this summer. So it should be available in the next release of OpenSolaris or sooner if you want to play around with the development branch.
Check lessFS, data-deduplication filesystem, for Linux. It is still in beta but you can try it out:
Regards,
MV
For people who may be unfamiliar with data deduplication, it is a technique whereby data is analyzed at the file (or block, I suppose) level, and where identical files/blocks throughout the file system are replaced with a smaller token. This has the effect of greatly shrinking the effective size on disk. It could be considered a form of copy-on-write. Read the wiki page on it.
There is no filesystem that I have heard of in Linux to do dedup, file or block level. Such a beast would be handy, although pretty processor intensive.
Deduplication is now available with ZFS on OpenSolaris (build 128a and newers).
I just posted a project that I have been working on that does inline deduplication. You can take a look at it here if you are interrested. It is based on fuse and runs on linux.
so ... no news about deduplication on Linux? opendedup might be a choice but giving the java platform it runs on, i don't wanna get headaches. I have tried it yes, but this java machine and the rest are not getting very well with my needs of storage response times and safety.
I don't know of any free implementations of dedup for Linux. I have seen some storage vendors recommending using a HSM(hierarchical storage management) system with a VTL(Virtual storage Library) which does dedup.
You could also consider an Occarina like system which is not transparent but can provide better results than dedup.
Deduplication option is available under Linux, on filesystems BTRFS and ZFS. BTRFS is natively developed under linux and has off-line deduplication tool. I aren't thinking 'offline', you must umount fs. Offline means, actively writed data isn't deduplicated. But later you run tool for deduplicate thinks stored now. Actually probably tool is in beta. Other way is inside ZFS. Avaliable as FUSE and natively: http://zfsonlinux.org/ . This do online deduplication, unfortunately this slow down writes because all must be calculated on the fly. You can online off and on this behavior. After you off deduplication, all deduplicated data will be still stored as deduplicated. New writes will be stored as 'duplicated'. If you want deduplicate that data in the future, you must turn on deduplication and rewrite all 'duplicated' files.
See doc available on the page. For speed up writings and readings, you can add faster devices to the storage pool (specially SDD drives or maybe faster flash USB, pay attention on device reliability).
DRBD does just that and does it really well to ! Can do Master/Slave or Master/Master :-)