6

We're using Microsoft's Distributed File System for replication. In our scenario we have a single writer that creates/overwrites/deletes files and several distributed readers. The writer is running Windows Server 2008 R2 Enterprise x64 SP 1, the readers are running Windows Server 2003 R2 Standard Edition x86 SP 2. Some of the readers run DFSR with version 5.2.3790.4656 and some others with the patched version 5.2.3790.4799.

Files are written using System.IO.File.WriteAllText and writes can occur in quick succession on one and the same file.

Most of the time replication works fine but sometimes files end up being filled entirely with null bytes after replication. We checked the DFSR logs on all involved machines and found the following kind of usn journal entry reports on the readers:

20150302 11:05:00.498 2512 USNC  2202 UsnConsumer::UpdateIdRecord ID record updated from USN_RECORD:
+    USN_RECORD:
+    RecordLength:        80
+    MajorVersion:        2
+    MinorVersion:        0
+    FileRefNumber:       0x800000000f7c8
+    ParentFileRefNumber: 0x31000000152806
+    USN:                 0x872e876720
+    TimeStamp:           20150302 11:05:00.498 CET
+    Reason:              Basic Info Change Close Rename New Name
+    SourceInfo:          0x4
+    SecurityId:          0xebe
+    FileAttributes:      0x2220
+    FileNameLength:      18
+    FileNameOffset:      60
+    FileName:            xyz.txt

What got us wondering is that there are usn journal entry reports at all (the readers are just supposed to read but not to modify anything) and the fact that the sparse attribute is set.

To find out whether there's some process performing unexpected writes or doing something suspicious we monitored filesystem activity with Process Monitor. The next occurance of null-byte files on a reader got us the following:

10:59:55,2311121    Dfsr.exe    1584    760 IRP_MJ_CREATE                   path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Desired Access: Generic Read/Write/Execute, Write DAC, Write Owner, Access System Security, Disposition: Create, Options: Sequential Access, Synchronous IO Non-Alert, Complete If Oplocked, Open For Backup, Open No Recall, Attributes: N, ShareMode: Read, Write, Delete, AllocationSize: 0, OpenResult: Created
10:59:55,2312485    Dfsr.exe    1584    760 IRP_MJ_CLEANUP                  path\to\xyz.txt-{GUID}-vVERSION    SUCCESS
10:59:55,2313007    Dfsr.exe    1584    760 IRP_MJ_CLOSE                    path\to\xyz.txt-{GUID}-vVERSION    SUCCESS
10:59:55,2314394    Dfsr.exe    1584    760 IRP_MJ_CREATE                   path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Desired Access: Read Attributes, Write Attributes, Synchronize, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Open For Backup, Open Reparse Point, Open No Recall, Attributes: N, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
10:59:55,2314626    Dfsr.exe    1584    760 IRP_MJ_FILE_SYSTEM_CONTROL      path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Control: FSCTL_MARK_HANDLE
10:59:55,2314780    Dfsr.exe    1584    760 IRP_MJ_QUERY_INFORMATION        path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Type: QueryNameInformationFile, Name: path\to\xyz.txt-{GUID}-vVERSION
10:59:55,2314996    Dfsr.exe    1584    760 FASTIO_QUERY_INFORMATION        path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Type: QueryBasicInformationFile, CreationTime: 10.03.2015 10:59:55, LastAccessTime: 10.03.2015 10:59:55, LastWriteTime: 10.03.2015 10:59:55, ChangeTime: 10.03.2015 10:59:55, FileAttributes: ANCI
10:59:55,2315081    Dfsr.exe    1584    760 IRP_MJ_QUERY_INFORMATION        path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Type: QueryAttributeTagFile, Attributes: ANCI, ReparseTag: 0x0
10:59:55,2315194    Dfsr.exe    1584    760 IRP_MJ_QUERY_INFORMATION        path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Type: QueryCompressionInformationFile
10:59:55,2315391    Dfsr.exe    1584    760 IRP_MJ_QUERY_VOLUME_INFORMATION path\to\xyz.txt-{GUID}-vVERSION    BUFFER OVERFLOW Type: QueryInformationVolume, VolumeCreationTime: 14.07.2014 14:59:54, VolumeSerialNumber: 88F0-15DC, SupportsObjects: True, VolumeLabel: uvw
10:59:55,2315481    Dfsr.exe    1584    760 IRP_MJ_QUERY_INFORMATION        path\to\xyz.txt-{GUID}-vVERSION    BUFFER OVERFLOW Type: QueryAllInformationFile, CreationTime: 10.03.2015 10:59:55, LastAccessTime: 10.03.2015 10:59:55, LastWriteTime: 10.03.2015 10:59:55, ChangeTime: 10.03.2015 10:59:55, FileAttributes: ANCI, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0xe00000001589e, EaSize: 0, Access: Read Attributes, Write Attributes, Synchronize, Position: 0, Mode: Sequential Access, Synchronous IO Non-Alert, AlignmentRequirement: Long
10:59:55,2316459    Dfsr.exe    1584    760 IRP_MJ_CREATE                   path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Desired Access: Generic Read/Write/Execute, Write DAC, Write Owner, Access System Security, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Complete If Oplocked, Open For Backup, Open No Recall, Attributes: N, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
10:59:55,2316691    Dfsr.exe    1584    760 IRP_MJ_FILE_SYSTEM_CONTROL      path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Control: FSCTL_MARK_HANDLE
10:59:55,2316796    Dfsr.exe    1584    760 IRP_MJ_CLEANUP                  path\to\xyz.txt-{GUID}-vVERSION    SUCCESS
10:59:55,2316876    Dfsr.exe    1584    760 IRP_MJ_CLOSE                    path\to\xyz.txt-{GUID}-vVERSION    SUCCESS
10:59:55,2317891    Dfsr.exe    1584    760 IRP_MJ_SET_SECURITY             path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Information: Owner, Group, DACL
10:59:55,2318748    Dfsr.exe    1584    760 IRP_MJ_FILE_SYSTEM_CONTROL      path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Control: FSCTL_SET_SPARSE
10:59:55,2319307    Dfsr.exe    1584    760 IRP_MJ_WRITE                    path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Offset: 0, Length: 0
10:59:55,2319442    Dfsr.exe    1584    760 IRP_MJ_SET_INFORMATION          path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Type: SetEndOfFileInformationFile, EndOfFile: 240
10:59:55,2320066    Dfsr.exe    1584    760 IRP_MJ_SET_INFORMATION          path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Type: SetAllocationInformationFile, AllocationSize: 240
10:59:55,2320382    Dfsr.exe    1584    760 IRP_MJ_WRITE                    path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Offset: 240, Length: 0
10:59:55,2320505    Dfsr.exe    1584    760 IRP_MJ_SET_INFORMATION          path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Type: SetBasicInformationFile, CreationTime: 16.12.2013 10:57:23, LastAccessTime: 19.02.2015 11:00:25, LastWriteTime: 10.03.2015 10:59:55, ChangeTime: 10.03.2015 10:59:55, FileAttributes: ANCI
10:59:55,2320688    Dfsr.exe    1584    760 IRP_MJ_FILE_SYSTEM_CONTROL      path\to\xyz.txt-{GUID}-vVERSION    SUCCESS Control: FSCTL_WRITE_USN_CLOSE_RECORD
10:59:55,2321256    Dfsr.exe    1584    760 IRP_MJ_CLEANUP                  path\to\xyz.txt-{GUID}-vVERSION    SUCCESS
10:59:55,2321506    Dfsr.exe    1584    760 IRP_MJ_CLOSE                    path\to\xyz.txt-{GUID}-vVERSION    SUCCESS

Going by this observation it looks like DFSR is the one producing them. Why is it doing that? Also, why does it mark the file as sparse? The writer does not mess with the files beyond writing them using the technique outlined above. Note that the offset where the EOF marker is moved to matches the size of the file as created by the writer.

So far we're unable to reproduce the issue reliably. Does anyone have an idea what's causing this and, more importantly, how to fix it?

mbx
  • 455
  • 9
  • 21
ckerth
  • 61
  • 3
  • 4
    My theory is that it's because DFSR is, in reality, a sadistic experiment that Microsoft is using to test the viability of using an OS feature as a biological weapon. I can't prove it, yet, but I'm sure that DFSR has cut at least 20 years off my lifespan so far. – HopelessN00b Mar 30 '15 at 15:10
  • 2
    my understanding is that 2003 has some issues, what happens with 2008 readers? – Jim B Mar 30 '15 at 16:26
  • @JimB upgrading the readers was part of the solution, thanks – ckerth Aug 08 '16 at 12:03

2 Answers2

1

Is your DFS-R staging area on the same volume as the DFS-R folder? For performance reasons, it should be. If not, then DFS-R is having to copy the file from staging volume to the destination volume rather than doing a straight move.

Here's where the speculation kicks in. It may be that during this copy operation, DFS-R is creating a sparse file, and then filling the blocks, and "unsparsing" it when complete. If something interrupts this process (like antivirus, Undelete, or some other file filter driver program scanning the DfsrPrivate folder), then you may end up with a temporary sparse file that doesn't get filled with its contents.

You can test for this behavior by using Process Monitor on the files that are replicating properly and seeing if they are marked/unmarked as sparse at any point in the process.

I'm not a fan of mixing 2008 and 2003 when it comes to DFS-R. I was soooo glad to get the last 2003 machine off our DFS tree.

GuitarPicker
  • 394
  • 1
  • 8
  • 1
    If the DfsrPrivate folder is meant as the staging one then yes, staging and destination folder are on the same volume. We monitored both the DfsrPrivate and the destination folder using ProcMon and no other process was accessing the files while DFS-R was replicating them. _Jim B_ also mentioned to migrate the readers to 2008 but it will take some time to test. On the other hand if this is not a known bug or designed behaviour of DFS-R on 2003 I don't even know if this will "fix" anything. – ckerth Apr 13 '15 at 08:00
0

Upgrading the readers to the latest version of DFSR 2008 R2 just lessened the occurrences of the issue but didn't resolved it completely. After additionally configuring the readers as write-protected the issue wasn't observed again (half a year since the last upgrade).

ckerth
  • 61
  • 3