fragmentation leads to excessive vhdx backup size

Question

The server hosts a service called MultiCash-Datenbank. For each user it keeps two cache files (SPASD32.SRC and SPASD32Z.SRC), which grow in size by ~1MB/day. There's also a bunch of small data files added each day. I have been observing our networked backups for three months, and noticed that the vhdx image of the partition holding this data keeps growing in size, by 300-900MB/day. On a 1TB partition, the 7GB of data eventually ballooned into a 30GB vhdx file and I had to take action.

Chronological order of temporary solutions that I've discovered before having the idea to run DiskView:

recreate the partition (moving the files back and forth consolidates them)
shrink the partition (performs a free space consolidation step)
cap the partition size to 10GB (caps the image size to 10GB)
run manual defragmentation (default scheduled defrag does nothing on 2012r2!)

So. For some unknown reason, the clusters of these files are laid out on disk in a very unusual way:

Each 4k cluster is separated from the others by around 256 clusters (1MB) of free space. Also, the files are interleaved most of the time. This pattern continues until it covers all available free space. Then, as the files grow further, groups of multiple clusters become more frequent.

No idea if this fragmentation is caused by the pattern of writes of the service itself, or some ntfs optimization mechanism. Fsutil reports that the files are not flagged as sparse. Contig reports that on this 10GB partition that holds 7GB of data, there are around 3000 such fragments (= spannning 3GB of space). This would make sense if the disk imaging process allocated a 1MB block whenever data was present. I have read that the vhdx format contains performance optimizations, so this could be one of them. Then it would unfortunately lead to this worst-case scenario.

I'm also open to the possibility that I'm completely wrong and my observations are unrelated to the actual cause. One warning sign is that the inflated backups do not compress to the same size as the optimized backups - for a 100% inflation in size there's 25% extra of compressed data.

So in the end, I'm left with a partial understanding of the situation, and some ugly workarounds. I would like to ask: What's causing that fragmentation, and how to make it stop? Is Windows Server Backup's vhdx format really using 1MB blocks, and if so, can that be changed?

Well, so far I have learned that the BlockSize used by WSB is 2MB, using `powershell Get-VHD -path file.vhd`. Don't know if there's a setting for that. — theultramage, Apr 01 '17 at 07:19

score 5 · Accepted Answer · answered Oct 01 '20 at 17:40

Ultimately, the most straightforward solution was to add a custom defrag task, with commandline parameters specifying a 'traditional' defrag. This consolidated the thousands of fragments into contiguous files, which in turn let the vhdx image avoid having to include all that empty space between fragments due to its 2MB BlockSize. Eventually the server software causing the problem was decommisioned and replaced by a web portal hosted by the software provider. Hence the root of the problem was gone. The defrag task was still handy for keeping the small server optimized, so I left it there.

I never did find out the reason why the database files were being fragmented like that. The feedback I got from the software's vendor said they have not received any other reports like this. It did not address the question of what behavior is causing that fragmentation pattern, it instead focused on the backup method, suggesting a different backup format (thus different backup software), or changing the BlockSize that WSB picks when creating the vhdx container (but no such option was available). So nothing helpful or informative. I have some guesses as to what was going on, varying from it trying to manually implement sparse files, to it trying to align its data to let the drive heads seek better. Both of these sound odd, but with such a large piece of corporate software that's been in development since the 1990's (and the UI still looks that way), anything goes.

As to why the built-in Windows 'Defrag' task wasn't actually defragmenting the drive... well, it used to, but for Server 2012 and newer, Microsoft decided that running traditional defrag by default was no longer practical due to the ever-growing size of server storage. Presumably, the server operator would know if any partitions needed traditional defrag, and would define a custom task to handle that. I was not aware of this change. It didn't help that the commandline of the built-in task never included the actual /D parameter - it instead relied on default behavior when no task was specified. New Windows versions included additional parameters, which overrode the default behavior. This kind of thing is harder to spot if one's just looking out for parameters being removed.

fragmentation leads to excessive vhdx backup size

1 Answers1