13

I'm experimenting with deduplication on a Server 2012 R2 storage space. I let it run the first dedupe optimisation last night, and I was pleased to see that it claimed a 340GB reduction.

enter image description here

However, I knew that this was too good to be true. On that drive, 100% of the dedupe came from SQL Server backups:

enter image description here

That seems unrealistic, considering that there are databases backups that are 20x that size in the folder. As an example:

enter image description here

It reckons that a 13.3GB backup file has been deduped to 0 bytes. And of course, that file doesn't actually work when I did a test restore of it.

To add insult to injury, there is another folder on that drive that has almost a TB of data in it that should have deduped a lot, but hasn't.

Does Server 2012 R2 deduplication work?

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
  • 5
    I'm going to have to remember that one. "Of course I didn't delete your data because you pissed me off. I deduped it to 0 bytes, is all." – HopelessN00b Jan 14 '15 at 22:08
  • Is it possible it is doing dedup assuming the data to be relatively the same from one night to the next. Meaning, if you have the first and last backups, the only thing each night would be a snapshot of the differences, like VSS. In theory, it might be possible to dedup it to 0 given the first and last copies might be enough to regenerate the file in the middle. But since it failed a restore, I'm going to wait to see what you come up with as an explanation. But your test isn't promising.. – MikeAWood Jan 15 '15 at 01:34
  • @MikeAWood it de-duped totally different database backups to 0 bytes as well, which is most certainly wrong. One of the things I wanted the dedupe for is, as you've pointed out, 90% of the backups from night to night are identical. – Mark Henderson Jan 15 '15 at 01:59
  • @MarkHenderson if you setup a new drive and copy everything to it, does it work then? Just idly guessing. Maybe it is similar to DFS where the process of seeing the initial data has to be done or it will fail to work correctly. Your results are odd, no question. Hopefully you figure it out, I am curious to know what happened.. – MikeAWood Jan 15 '15 at 02:02
  • @MikeAWood - I didn't try that. I've since nuked that drive and re-created it with different dedupe settings, so I'll see what happens tonight when another dump runs – Mark Henderson Jan 15 '15 at 02:03
  • Any news for your tests ? Im curious too see the result – yagmoth555 Jan 17 '15 at 13:17
  • @yagmoth555 No good news. I changed to only run on background optimisation, and now I'm getting 0% dedupe ratio. So I'm still just experimenting. – Mark Henderson Jan 18 '15 at 21:13

3 Answers3

5

Deduplication does work.

With deduplication, Size on disk field becomes meaningless. The files are no longer usual "files" but reparse points and don't contain actual data but metadata for dedup engine to reconstruct file. It is my understanding that you cannot get per-file savings as dedup chunk store is per volume so you only get per-volume savings. http://msdn.microsoft.com/en-us/library/hh769303(v=vs.85).aspx

Perhaps your dedup job had not yet completed, if some other data was not yet deduped. It's not super-fast, is time-limited by default and might be resource-constrained depending on your hardware. Check dedup schedule from Server Manager.

I have deployed dedup on several systems (Windows 2012 R2) in different scenarios (SCCM DP, different deployment systems, generic file servers, user home folder file servers etc) for about a year now. Just make sure you're fully patched, I remember several patches to dedup functionality (both Cumulative Updates and hotfixes) since RTM.

However there are some issues that some systems cannot read data directly from optimized files in local system (IIS, SCCM in some scenarios). As suggested by yagmoth555, you should either try Expand-DedupFile to unoptimize it or just make a copy of the file (target file will be unoptimized until next optimization run) and retry. http://blogs.technet.com/b/configmgrteam/archive/2014/02/18/configuration-manager-distribution-points-and-windows-server-2012-data-deduplication.aspx https://kickthatcomputer.wordpress.com/2013/12/22/no-input-file-specified-windows-server-2012-dedupe-on-iis-with-php/

If your SQL backup is actually corrupted, I do believe that it's because of a different issue and not deduplication technology related.

Don Zoomik
  • 1,458
  • 9
  • 12
  • Thanks for the answer. Your answer mirrors my own findings. I had some misunderstandings about dedupe, and my testing methodology was flawed. – Mark Henderson Jan 19 '15 at 21:14
  • @Mark anything about your misunderstandings and testing methodology you could share...? Perhaps in a blog post? Would be interesting to learn as I can't think of where you (and therefore I) might have gone wrong. EDIT: I've now seen your answer...but a blog post would be a good read if you have one. – Ashley Jan 20 '15 at 23:22
  • 1
    @AshleySteel I don't really blog any more. Used to once upon a time. The whole thing basically came down to me not understanding how Windows Server dedupe works... – Mark Henderson Jan 21 '15 at 01:36
2

It looks like I may have jumped the gun saying that this sort of deduplication is not possible. Apparently, it is totally possible, because in addition to these uncompressed SQL Server backups, I also have VMWare snapshot-level backups of the host VM's.

As yagmoth555 suggested, I ran an Expand-DedupeFile on some of these 0-byte files and I got a totally usable file back at the end of it.

I then looked at my testing methodolgy for how I determined that the files were no good, and I found a flaw in my tests (permissions!).

I also opened a 0-byte deduped backup file in a hex editor, and everything looked OK.

So I adjusted my testing methodology and everything actually seems to work. As I left it, the dedupes actually got better, and I've now saved more than 1.5TB of space thanks to dedupe.

I am going to test this more thoroughly before I give it a push into production, but right now it looks promising.

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
0

Yes, but I only seen the case of a hyperv cluster db dedup'ed. 4tb to 400g, and the VM was running. The OS was fully patched.

For your sql backup file, is it a dump that you can read in it? I would check the content. For that part I cant answer how it dedup ascii file.

yagmoth555
  • 16,300
  • 4
  • 26
  • 48
  • They are binary files, but as I've already mentioned whatever is in them is totally corrupted. I did not actually check the contents in a hex editor, and I've since nuked that drive and recreated it with different dedupe parameters, to see what happens tonight. – Mark Henderson Jan 15 '15 at 02:00
  • 1
    @MarkHenderson It can be a chunk corruption in the dedup metadata as the size was 0. Quoted; "Deduplication raises the impact of a single chunk corruption since a popular chunk can be referenced by a large number of files. Imagine a chunk that is referenced by 1000 files is lost due to a sector error; you would instantly suffer a 1000 file loss." The cmd Expand-DedupFile will rule out if it's a bad .bak or a dedup corruption – yagmoth555 Jan 15 '15 at 03:00