I mean, I can look up the dictionary definition, but why is everyone suddenly talking about it in reference to virtual tape libraries? What's "new" here so that it's so much in the news lately?
2 Answers
Deduplication is where you look at the content of a data set, note all the duplicate bits that are present, and store the data just once, replacing all those otherwise copies of data with a pointer back to the one copy. It is particularly helpful with backups because when you back up things like servers so much of the data is the same. Imagine, for instance, you are backing up 1,000 Windows servers - much of the content on those boxes will be identical.
Deduplication is so popular today for 3 reasons:
Lately everyone is obsessed with building disaster recovery solutions that utilize off-site servers. To do this, you have to replicate a lot of production data to the remote site and bandwidth is a huge problem. Any reduction in the amount of data you have to replicate helps a lot.
The amount of data companies are retaining is exploding - thanks to cheaper storage and multi-industry requirements for retention of records.
The technology relatively recently hit the sweet spot. We've had things like deduplication for a long time (single instance storage, etc) which has helped but only in the last year or so have we seen real deduplication that can significantly reduce the amount of storage hit the mainstream.
- 4,718
- 1
- 20
- 15
-
2I would also add that the cost of de-dup solutions is dropping so vendors have an easier job selling its benefits - and if its easier to sell, vendors will talk about it more... I haven't noticed discussion specifically addressing virtual tape libraries over other backup methods, but I guess it's an opportunity to market the benefits of both together. – William Mar 10 '10 at 03:21
-
1@William: Yeah, exactly, I kinda meant to refer to the cost part when I said "sweet spot" but didn't make that clear so thanks for pointing it out. Certainly the cost has become low enough that a lot of us can find a dedupe solution that we can actually afford. – icky3000 Mar 10 '10 at 03:26
One of the things we found out at my company in working with Netapp is that deduplication really only works well in a VM environment if you have your drives aligned. Which is a problem for us as we have a lot of Windows Server 2003 machines and none of the drives are aligned. Which means you barely recover around a fourth of the space possible if the drives are aligned correctly.
We are being told though that once the drives are aligned correctly we should be able to recover 40-60% of our space back with dedup.
- 376
- 1
- 9
-
That an issue of the concrete NetApp implementation that uses (for other totally understandable reasons) static block sizes of 4 KB. The alternative would be variable-sized content-defined chunks that don't require a good alignment. – dmeister Jun 05 '10 at 10:04