4

I am interested to know what would be expected maximum dedupe ratio for a set of PST files.

I have ~40G of pst files from ~15 usres with high level of duplication of attachments. I am running tests to see if I can have significant space savings if I store the data on ZFS with dedupe.

For this purpose I have installed a test setup of Nexenta, but was wondering if someone here had already done this and what level of deduplication I might expect (or in another words how sensitive are pst files to block alignment and what are the parameters that can influence the ratio?).

Initial test show very low dedupe ratio and I did find explanation that block level dedupe would not be efficient here and that byte level dedupe would be much better (and that it should be performed by application that is aware of internal organization), so I am just double checking here if someone have some more input.

Otherwise I will probably be converting PST files to IMAP.

user9517
  • 114,104
  • 20
  • 206
  • 289
Unreason
  • 1,146
  • 1
  • 7
  • 22

1 Answers1

5

Yeah, PST files aren't likely to yield the dedup ratios you're looking for. Attachments inside a PST aren't going to be block aligned and ripe for deduplication. If you're looking to maximize dedupe possibilities with ZFS you're going to want a storage format where the attachments are distinct files.

notpeter
  • 3,505
  • 1
  • 24
  • 44