Block-level deduplication on Linux

Question

NetApp provides block-level deduplication (ASIS). Do you know any filesystem (even FUSE-based) on Linux (or OpenSolaris, *BSD) that provides the same functionnality ?

(I'm not interested in false deduplication like hardlinks).

score 7 · Answer 1 · answered Jun 10 '09 at 14:13

7

Deduplication is coming to ZFS on OpenSolaris but that functionality is not currently available.

It was prototyped by Jeff Bonwick and Bill Moore this past winter and they are working on integrating it this summer. So it should be available in the next release of OpenSolaris or sooner if you want to play around with the development branch.

answered Jun 10 '09 at 14:13

3dinfluence

12,409
2
27
41

See @jlliagre's answer - it's available now. – James Moore Dec 07 '11 at 20:16

score 6 · Accepted Answer · answered Sep 28 '09 at 11:36

6

Check lessFS, data-deduplication filesystem, for Linux. It is still in beta but you can try it out:

http://www.lessfs.com/

Regards,

MV

answered Sep 28 '09 at 11:36

MV.

766
7
11

Excellent ! That's still beta but that's definitely something to start with. – Benoît Sep 29 '09 at 10:29

score 4 · Answer 3 · answered Jun 10 '09 at 10:17

For people who may be unfamiliar with data deduplication, it is a technique whereby data is analyzed at the file (or block, I suppose) level, and where identical files/blocks throughout the file system are replaced with a smaller token. This has the effect of greatly shrinking the effective size on disk. It could be considered a form of copy-on-write. Read the wiki page on it.

There is no filesystem that I have heard of in Linux to do dedup, file or block level. Such a beast would be handy, although pretty processor intensive.

score 4 · Answer 4 · answered Dec 17 '09 at 11:00

4

Deduplication is now available with ZFS on OpenSolaris (build 128a and newers).

answered Dec 17 '09 at 11:00

jlliagre

8,691
16
36

score 3 · Answer 5 · edited Mar 25 '22 at 21:42

3

A year later, but here is a solution for OpenBSD called Epitome. Provided it's liberal licensing, it could very well make it into the Linux kernel.

edited Mar 25 '22 at 21:42

Paul

2,755
6
24
35

answered May 08 '10 at 13:43

score 1 · Answer 6 · answered Mar 13 '10 at 09:05

1

I just posted a project that I have been working on that does inline deduplication. You can take a look at it here if you are interrested. It is based on fuse and runs on linux.

answered Mar 13 '10 at 09:05

score 0 · Answer 7 · answered Apr 16 '10 at 13:20

so ... no news about deduplication on Linux? opendedup might be a choice but giving the java platform it runs on, i don't wanna get headaches. I have tried it yes, but this java machine and the rest are not getting very well with my needs of storage response times and safety.

score 0 · Answer 8 · answered Jun 24 '09 at 20:44

I don't know of any free implementations of dedup for Linux. I have seen some storage vendors recommending using a HSM(hierarchical storage management) system with a VTL(Virtual storage Library) which does dedup.

You could also consider an Occarina like system which is not transparent but can provide better results than dedup.

score 0 · Answer 9 · answered Jun 24 '14 at 13:05

Deduplication option is available under Linux, on filesystems BTRFS and ZFS. BTRFS is natively developed under linux and has off-line deduplication tool. I aren't thinking 'offline', you must umount fs. Offline means, actively writed data isn't deduplicated. But later you run tool for deduplicate thinks stored now. Actually probably tool is in beta. Other way is inside ZFS. Avaliable as FUSE and natively: http://zfsonlinux.org/ . This do online deduplication, unfortunately this slow down writes because all must be calculated on the fly. You can online off and on this behavior. After you off deduplication, all deduplicated data will be still stored as deduplicated. New writes will be stored as 'duplicated'. If you want deduplicate that data in the future, you must turn on deduplication and rewrite all 'duplicated' files.

See doc available on the page. For speed up writings and readings, you can add faster devices to the storage pool (specially SDD drives or maybe faster flash USB, pay attention on device reliability).

score -2 · Answer 10 · answered Jun 10 '09 at 10:02

-2

DRBD does just that and does it really well to ! Can do Master/Slave or Master/Master :-)

answered Jun 10 '09 at 10:02

Antoine Benkemoun

7,314
3
41
60

Could you please point me to the deduplication doc ? I can't find it on http://www.drbd.org/home/feature-list/ . – Benoît Jun 10 '09 at 10:06
I think Antoine meant 'duplication', which is not really what you were looking for, I know – Matt Simmons Jun 10 '09 at 10:09
oh my bad, what is the difference between duplication and deduplication ? – Antoine Benkemoun Jun 10 '09 at 10:26
I put a quick explanation up in my comment, but essentially duplication sends the data to another host, where as deduplication eliminates identical information throughout the filesyste, increasing effective free space – Matt Simmons Jun 10 '09 at 15:41

Block-level deduplication on Linux

10 Answers10