11

NetApp provides block-level deduplication (ASIS). Do you know any filesystem (even FUSE-based) on Linux (or OpenSolaris, *BSD) that provides the same functionnality ?

(I'm not interested in false deduplication like hardlinks).

Benoît
  • 1,331
  • 3
  • 11
  • 23

10 Answers10

7

Deduplication is coming to ZFS on OpenSolaris but that functionality is not currently available.

It was prototyped by Jeff Bonwick and Bill Moore this past winter and they are working on integrating it this summer. So it should be available in the next release of OpenSolaris or sooner if you want to play around with the development branch.

3dinfluence
  • 12,409
  • 2
  • 27
  • 41
6

Check lessFS, data-deduplication filesystem, for Linux. It is still in beta but you can try it out:

http://www.lessfs.com/

Regards,

MV

MV.
  • 766
  • 7
  • 11
4

For people who may be unfamiliar with data deduplication, it is a technique whereby data is analyzed at the file (or block, I suppose) level, and where identical files/blocks throughout the file system are replaced with a smaller token. This has the effect of greatly shrinking the effective size on disk. It could be considered a form of copy-on-write. Read the wiki page on it.

There is no filesystem that I have heard of in Linux to do dedup, file or block level. Such a beast would be handy, although pretty processor intensive.

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
4

Deduplication is now available with ZFS on OpenSolaris (build 128a and newers).

jlliagre
  • 8,691
  • 16
  • 36
3

A year later, but here is a solution for OpenBSD called Epitome. Provided it's liberal licensing, it could very well make it into the Linux kernel.

Paul
  • 2,755
  • 6
  • 24
  • 35
1

I just posted a project that I have been working on that does inline deduplication. You can take a look at it here if you are interrested. It is based on fuse and runs on linux.

0

so ... no news about deduplication on Linux? opendedup might be a choice but giving the java platform it runs on, i don't wanna get headaches. I have tried it yes, but this java machine and the rest are not getting very well with my needs of storage response times and safety.

0

I don't know of any free implementations of dedup for Linux. I have seen some storage vendors recommending using a HSM(hierarchical storage management) system with a VTL(Virtual storage Library) which does dedup.

You could also consider an Occarina like system which is not transparent but can provide better results than dedup.

James
  • 2,212
  • 1
  • 13
  • 19
0

Deduplication option is available under Linux, on filesystems BTRFS and ZFS. BTRFS is natively developed under linux and has off-line deduplication tool. I aren't thinking 'offline', you must umount fs. Offline means, actively writed data isn't deduplicated. But later you run tool for deduplicate thinks stored now. Actually probably tool is in beta. Other way is inside ZFS. Avaliable as FUSE and natively: http://zfsonlinux.org/ . This do online deduplication, unfortunately this slow down writes because all must be calculated on the fly. You can online off and on this behavior. After you off deduplication, all deduplicated data will be still stored as deduplicated. New writes will be stored as 'duplicated'. If you want deduplicate that data in the future, you must turn on deduplication and rewrite all 'duplicated' files.

See doc available on the page. For speed up writings and readings, you can add faster devices to the storage pool (specially SDD drives or maybe faster flash USB, pay attention on device reliability).

Znik
  • 338
  • 1
  • 3
  • 12
-2

DRBD does just that and does it really well to ! Can do Master/Slave or Master/Master :-)

Antoine Benkemoun
  • 7,314
  • 3
  • 41
  • 60
  • Could you please point me to the deduplication doc ? I can't find it on http://www.drbd.org/home/feature-list/ . – Benoît Jun 10 '09 at 10:06
  • I think Antoine meant 'duplication', which is not really what you were looking for, I know – Matt Simmons Jun 10 '09 at 10:09
  • oh my bad, what is the difference between duplication and deduplication ? – Antoine Benkemoun Jun 10 '09 at 10:26
  • I put a quick explanation up in my comment, but essentially duplication sends the data to another host, where as deduplication eliminates identical information throughout the filesyste, increasing effective free space – Matt Simmons Jun 10 '09 at 15:41