13

Scenario:

  • Doing a server migration from old Server 2008 R2 to new Server 2016, following this Server Fault guide: File server migration using Robocopy

  • After Robocopy completes, enable deduplication on Server 2016 for the copied volume, and then use PowerShell to start dedpulicating manually. After many hours it completes and recovers about 25% of disk space.

  • Run Robocopy again to copy anything that may have been missed in the initial copy, as a final check of the new server.

....but Robocopy (run from Server 2016) doesn't understand deduplication and so instead proceeds to trash the deduplication chunkstore..

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows                              
-------------------------------------------------------------------------------

  Started : Sunday, July 8, 2018 12:10:02 PM
   Source : \\SERVER-2008\e$\
     Dest : \\SERVER-2016\e$\

    Files : *.*

  Options : *.* /TEE /S /E /COPYALL /PURGE /MIR /ZB /NP /MT:32 /R:1 /W:10 

------------------------------------------------------------------------------

    *EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\
    *EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\
    *EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\
      *EXTRA File         253504    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\DedupFileList.1
      *EXTRA File         253504    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\DedupFileList.2
      *EXTRA File             28    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\stamp.dat
    *EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\COW\
    *EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\COW\00010000\
      *EXTRA File         196608    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\COW\00010000\00000046.00.RB
      *EXTRA File         106496    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\COW\00010000\00000048.00.RB

[.......]

*EXTRA File           30.3 m    \\SERVER-2016\e$\System Volume Information\Dedup\ChunkStore\{B7E1F3A4-AAD9-4449-9DF7-5489421C9EC5}.ddp\Stream\000f0000.00000002.ccc
*EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\Logs\
  *EXTRA File         29.7 m    \\SERVER-2016\e$\System Volume Information\Dedup\Logs\00000001.kmchange.log
  *EXTRA File        999.8 m    \\SERVER-2016\e$\System Volume Information\Dedup\Logs\changes.optimization.1.10.archive.etl
  *EXTRA File       1000.0 m    \\SERVER-2016\e$\System Volume Information\Dedup\Logs\changes.optimization.1.11.archive.etl
  *EXTRA File        735.5 m    \\SERVER-2016\e$\System Volume Information\Dedup\Logs\changes.optimization.1.12.archive.etl
  *EXTRA File        999.8 m    \\SERVER-2016\e$\System Volume Information\Dedup\Logs\changes.optimization.1.9.archive.etl
  *EXTRA File          1.3 m    \\SERVER-2016\e$\System Volume Information\Dedup\Logs\changes.optimization.2.1.archive.etl
*EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\Settings\
  *EXTRA File             76    \\SERVER-2016\e$\System Volume Information\Dedup\Settings\Dedup.00.cfg
  *EXTRA File             76    \\SERVER-2016\e$\System Volume Information\Dedup\Settings\Dedup.01.cfg
  *EXTRA File           2228    \\SERVER-2016\e$\System Volume Information\Dedup\Settings\dedupConfig.01.xml
  *EXTRA File           2228    \\SERVER-2016\e$\System Volume Information\Dedup\Settings\dedupConfig.02.xml
  *EXTRA File              0    \\SERVER-2016\e$\System Volume Information\Dedup\Settings\VolumeJobLock.bin
*EXTRA Dir        -1    \\SERVER-2016\e$\System Volume Information\Dedup\State\
  *EXTRA File           2982    \\SERVER-2016\e$\System Volume Information\Dedup\State\chunkStoreStatistics.xml
  *EXTRA File           2592    \\SERVER-2016\e$\System Volume Information\Dedup\State\dedupStatistics.xml
  *EXTRA File         11.5 m    \\SERVER-2016\e$\System Volume Information\Dedup\State\GCReservedSpaceBitmap.tmp
  *EXTRA File          1.0 g    \\SERVER-2016\e$\System Volume Information\Dedup\State\GCReservedSpaceContainer.ccc
  *EXTRA File         46.0 m    \\SERVER-2016\e$\System Volume Information\Dedup\State\GCReservedSpaceDeleteLogs.tmp
  *EXTRA File          1.0 m    \\SERVER-2016\e$\System Volume Information\Dedup\State\GCReservedSpaceFileList.tmp
  *EXTRA File           4096    \\SERVER-2016\e$\System Volume Information\Dedup\State\GroupCommitFlushControl0.bin
  *EXTRA File           2066    \\SERVER-2016\e$\System Volume Information\Dedup\State\optimizationState.xml

[......]

I aborted it moments after seeing this fly by in the log and recognizing what was happening. But the damage is already done, the data on the deduplicated new server was instantly corrupted by Robocopy as it stormed through \System Volume Information. The new server drive partition has be formatted and recopied all over again from Server 2008.

Is there a safe way to use Robocopy so that it doesn't touch the deduplication volume data?

Also, I have a new concern.. if Robocopy can destroy a deduplicated volume, what else is unsafe to use with a deduplicated volume, that sees right through it and can destroy the underlying data that should only be accessible by the server? (probably should be a separate question..)

Dale Mahalko
  • 725
  • 1
  • 6
  • 16
  • 6
    What did you expect to happen with the `/MIR` switch which is `MIRror a directory tree (equivalent to /E plus /PURGE` where `/PURGE :: delete dest files/dirs that no longer exist in source`? "Mirror" means make the destination a copy of source. Robocopy is powerful ... and of course we know what that means: [_With great power comes great responsibility!_](https://www.youtube.com/watch?v=nhLyPH_KirE) – davidbak Jul 09 '18 at 02:57
  • "\System Volume Information" is normally inaccessible and all programs normally are blocked from accessing it. There should not be any way that Robocopy is able to get in there even when run from an Administrator command prompt. Let's try it accessing it manually on that same Server 2016: Start -> Command prompt -> Run as Administrator. CD \System Volume Information. Access is denied. – Dale Mahalko Jul 09 '18 at 15:07
  • That's true. I should also have pointed out that you used `/ZB :: use restartable mode; if access denied use Backup mode` where Backup mode defeats most permissions in order to be able to read files "normally" unreadable in order to make complete backups. So it was the _combination_ of `/B` and `/MIR` that did you in. Robocopy is powerful ... as I mentioned above ... – davidbak Jul 09 '18 at 15:24
  • Following up on Greg's great answer - given the choice just leave the deduplication off until you have finished migrating. – Tim Brigham Jul 09 '18 at 18:15

5 Answers5

17

The System Volume Information directory should be excluded using the /XD switch. Probably a good idea to exclude other hidden/system directories such as $RECYCLE.BIN.

Greg Askew
  • 34,339
  • 3
  • 52
  • 81
7

Two command line switches that were used lead to this: /MIR and /ZB. As the documentation ( robocopy /??? ) describes:

/MIR :: MIRror a directory tree (equivalent to /E plus /PURGE).
/ZB :: use restartable mode; if access denied use Backup mode.

It's the combination that did you in: /MIR will delete (as pointed out when you run robocopy without arguments) and "Backup mode" defeats most permissions in order to be able to read files "normally" unreadable in order to make complete backups.

"Backup mode" is notably undefined in the "help" description. You've got to know that the Windows CreateFile API supports a flag called FILE_FLAG_BACKUP_SEMANTICS, which in combination with a certain access right SE_BACKUP_NAME (which is given to the Administrator group by default - also the Backup Operators group, duh) bypasses normal file security.

You didn't know that? Then you may also not know that robocopy wasn't originally part of Windows at all - it was part of a supplement called the "Windows Resource Kit" which was used mainly by programmers and hard-core sysadmins back in the day, and although it was grandfathered into the Windows distribution back in Windows Server 2008 it has never ever received any attention - except for additional performance options, woot! Particularly, no attention from program managers dedicated to UI or usability. So it's a raw bit of power that can be used - or misued! - at your own risk.

(A good rule of thumb: Don't use command line options you don't really understand.)

Information you might like to know about "Backup mode" file access:

https://isc.sans.edu/forums/diary/Use+The+Privilege/20483/

https://docs.microsoft.com/en-us/windows/desktop/api/FileAPI/nf-fileapi-createfilea

https://docs.microsoft.com/en-us/windows/desktop/FileIO/file-security-and-access-rights

davidbak
  • 171
  • 1
  • 9
  • 1
    BTW there's nothing dangerous (AFAIK) about `/Z` "restartable mode". It's the `B` that's problematic ... – davidbak Jul 09 '18 at 17:23
  • Can file server domain accounts, with their seperate owner and security data on each account directory, be fully and completely copied (/COPYALL or /COPY:DATSOU) using the administrator account, without using /B ? – Dale Mahalko Jul 11 '18 at 00:42
  • @DaleMahalko - TBH I don't know. Though I've been programming Windows for a couple of decades there are aspects I avoid, and so I only know enough about Windows security to get me unwedged when necessary ... I'm the kind of guy who is always logged in as a member of Administrator, I go into Group Policy and make everything totally unenforced, etc. Maybe someone else knows? – davidbak Jul 11 '18 at 02:19
1

The problem is that you are not copying the folders you need but the entire volume, this contains the hidden system folder "System Volume Information" which is used for anything related to the file system. Deduplication and File Server Resource Manager are storing their data in there too. By copying the volume to another one and using the /MIR*, also by using the /B** you're using the backup mode which can copy folders your admin account can't, you're actually replacing the System Volume Information on the Target and this destroys the Dedup's chunk store. I would advise against this type of copying, it is preferable to do it by folder OR exclude the "System Volume Information" folder at all (it will save a lot of hair pulling and swearing in the short/long-run).

"*" Mirrors the contents from the source to the target, removes files not existing in source from the target. ** Backup mode, copies files that can't be accessed by an admin's account (uses the SeBackupPrivilege for the source read and SeRestorePrivilege to copy to the target folder).

GeoSimos
  • 11
  • 2
1

Here are the followup results using the other answers provided, and testing with a deduplicated destination. (Meta: I don't know if I should be including this as an edit at the bottom of my original question.)

The Robocopy command line evolved to finally look like this:

robocopy \\OLD-SERVER\e$\ \\NEW-SERVER\e$\ /MIR /COPYALL /DCOPY:DAT /NP /Z /B /J /SL /MT:128 /R:1 /W:10 /LOG+:robocopy-log.txt /TEE /XD "Recycler" "Recycled" "$Recycle.bin" "System Volume Information" /XF "pagefile.sys" "swapfile.sys" "hiberfil.sys"

Options and purpose:

  • /MIR - Mirror source to destination, and delete files and directories on the destination, if they are no longer present on the source
  • /COPYALL - Copy all file info: data, attributes, and timestamps, NTFS Security ACLs, Owner info, Auditing info (not all included by default)
  • /DCOPY:DAT - Copy all directory info - data, attributes, timestamps (original creation timestamp is not copied by default; normally this changes to the date that it was copied by Robocopy)
  • /NP - Don't display progress
  • /Z - Use restartable mode
  • /B - Copy files in Backup mode (I don't know if this is needed for user directories where they are the exclusive owner, excluding the administrator. This option will destroy a deduplicated destination volume without excluding "System Volume Information")
  • /J - Copy using unbuffered I/O (faster copy of large multi-gig files)
  • /SL - Copy symbolic links rather than the target
  • /MT:128 - Use maximum CPU threads (better use of 10 gigabit Ethernet and many CPU cores)
  • /R:1 - If file access error, retry 1 time
  • /W:10 - If file access error, wait 10 seconds before retry
  • /LOG+ - Log the output to text file, append if log file already exists
  • /TEE - Print results to screen and to log file
  • /XD - Exclude directories, and everything within them. Names with spaces in them need be enclosed in quotes: "Recycler" "Recycled" "$Recycle.bin" "System Volume Information"
  • /XF - Exclude files: virtual memory and hibernation files if they happen to be present on the source: "pagefile.sys" "swapfile.sys" "hiberfil.sys"

Final re-run:

            Total    Copied   Skipped  Mismatch    FAILED    Extras 
 Dirs :    158189    153466    158186         0         0         0
Files :   1116292         0   1116296         0         0         0
Bytes :   1.350 t         0   1.350 t         0         0         0
Times :   0:01:04   0:00:00                       0:00:00   0:01:04

Deuplication report

,

Also, I do not know the proper channels to report bugs to Microsoft, but I have linked to this discussion at the bottom of Microsoft's deduplication documentation, on their Windows IT Pro Center website:

https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/overview

Dale Mahalko
  • 725
  • 1
  • 6
  • 16
  • `/MT:128` seems rather high; did you find out it was really effective to set it that high (and not counterproductive to go past a lower value)? – davidbak Jul 18 '18 at 17:40
  • 1
    P.S. I love working at the command line. Imagine the nasty tabbed dialog box you'd have to slowly work through in order to get to this precise functionality. And none of those UIs have a "save" mode, so you'd have to do it each time! – davidbak Jul 18 '18 at 17:43
  • I don't know why they bother to expose threading control to the end user. In the end it chugs through 1.5TB of data in one minute showing no changes, so whatever "impact" using max threads has, it seems to not matter. This performance is quite acceptable to me. – Dale Mahalko Jul 19 '18 at 11:42
  • @DaleMahalko - I personally found it useful to be able to control max threads. If I have 10 files to copy where each takes few hours setting MT:2 allows me to have only 2 file copy threads to be active at a time. On unreliable or complex infrastructure where interruptions happen MT:2 will ensure that only 2 file copies will get aborted rather than 10 if I had MT:10 (all files copying at the same time). In this case, if interruption occurs on the last 2 files out of 10 only those will need to be restarted (having other 8 already copied) rather than all of them. – Rod Sep 13 '19 at 04:38
  • It has now been a long time since I last used Robocopy but I seem to recall that a limited number of threads do not saturate the network link between two servers, but more threads do. And this really should be the focus. Specifying max threads is unhelpful, but "slowly add more copy threads until bandwidth X crossed, and if above threshold, don't add more threads as copies finish" is actually useful and would meet both our needs. – Dale Mahalko Sep 13 '19 at 21:07
0

Maybe it's just me, but my first thought was - don't ever try to copy the drive itself "e$". I would only ever Robocopy the specific folders that were created for user content, not any system folders created by Windows itself.

  • As the drive is "not C:" it's not normally used by Windows at all, which only stores system files and programs on the C: drive. In this case the entirety of drive E: is for user data and only accessed via network shares. The user is normally denied access to "\System Volume Information", even from an elevated command prompt. – Dale Mahalko Mar 19 '21 at 06:20