4

Here is the scenario: I have 5 TB (yes, that's a T) of files on a Windows server that I need to migrate to a new server in as short and efficient time as possible. (Think: Robocopy, Rsync, etc as I plan to use differentials to do this over time). The files are in ~41,000 directories under a single parent directory (d:\files\folder1, d:\files\folder2, etc).

Since these are migrating to a new server, I want to split this up so that they are not all in the same "files" directory, but are instead split as logically as possible across multiple drive (trying to stay at about 2TB drive sizes for backup and replication purposes).

Robocopy doesn't have a regex option. Rsync would require a linux server, which isn't impossible, but adds overhead: this is a Windows to Windows move. I've found a way to loop through the 41,000 directories using powershell and initiate Robocopy for each directory individually, thus allowing me to specify destination...but this seems inefficient.

One other option I've considered is migrating everything at once, and then scripting out a copy to the other drives as needed. This would mean copying 2/3 of the files twice.

Have I missed anything obvious?

Deer Hunter
  • 1,070
  • 7
  • 17
  • 25
  • 3
    1. 5TB isn't as much as you make it sound like, this isn't the 90s - heck you can buy a 6TB SATA drive now. 2. Why split this up over time? It'll take <12 hours on GbE 3. DeltaCopy is `rsync` for Windows. 4. RoboCopy can exclude directories - are you sure it wont work? 5. Splitting the files to multiple disks isn't necessary for most backup systems, and complicates RAID if you're concerned about availability in single failure scenarios. – Chris S Aug 20 '14 at 20:00
  • 2
    `Have I missed anything obvious?` Yes, yes you have. Attach a >5 TB drive to the server (you can get 6 and 8 TB USB drives these days), copy the files to the new drive (or do a block-level clone, if you prefer), move the USB drive to the other server, and copy the files back. A lot faster than doing a network transfer, and as ChrisS noted, you're only making things more complicated and difficult on yourself by trying to limit your drive size to 2TB - there's just no reason to do that these days (and if there is, you need to replace the old PoS that has a 2TB limit). – HopelessN00b Aug 20 '14 at 20:12
  • I agree with Chris, 5TB isn't so much data. However, 41000 directories is a little horrible because I assume you've got lots of little files. That will make the copy process a tad slow. – hookenz Aug 20 '14 at 20:28
  • I'd go with the hard drive copy idea. Anything over the network is going to be slow. – hookenz Aug 20 '14 at 20:54
  • Don't forget that Robocopy has the **`/MT`** (multi-threaded) switch. This can dramatically speed it up if you have lots of small files. – Zoredache Aug 20 '14 at 21:15
  • @Matt How are you suggesting to attach the drive? A gigabit ethernet link usually seems faster than the USB2 ports that seem to be used in 99% of the servers that seem to exist. Not many servers have USB3. Of course if you have a spare drive bay, can handle e-sata in the source and destination servers that would be different. – Zoredache Aug 20 '14 at 21:18
  • `Robocopy for each directory individually, thus allowing me to specify destination...but this seems inefficient.` - Though I am not sure it is a good idea, there are several methods to do multi-tasking or multi-threading with Powershell. So if you took this approach you could be running a half dozen separate copies at once. – Zoredache Aug 20 '14 at 21:22
  • If you can't take down your server that might be a problem. So doing it over ethernet might be the only option. – hookenz Aug 20 '14 at 21:30

3 Answers3

0

I've done window 2 windows rsync using the CygWin framework a couple of years back. Rsync + sshd is definently doable.

I also found this and it looks like it might make windows-rsyncing easier than ever: http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp

We usually clone the data to through physical disks first to make the "first sync" as fast as possible and then use rsync afterwards to only move the deltas/differences over the network.

thelogix
  • 389
  • 1
  • 7
  • 3
    I've found the windows rsync port with cygwin performs very poorly unfortunately. It's not that surprising. It also misses file permissions. – hookenz Aug 20 '14 at 20:25
  • Well as soon as your rsyncing over ssh, i think the SSHd is the bottleneck (or used to be anyway). – thelogix Aug 20 '14 at 20:34
  • not really, it just underperformed. Worse was the fact that it missed file permissions and other metadata. Don't use it on windows until someone comes up with a native port that overcomes the limitations. Acrosync claims to be a native port. – hookenz Aug 20 '14 at 20:53
  • Yes dont use if you need to replicate the windows file permissions. But a complete disparagement is a bit harsh. I've used it to syncronize avatar image-files in a webserver cluster and IIS logfiles to a central logparsing-server. Original permissions were unimportant and new file followed the umask of cygwin. Perfect tool for both jobs. – thelogix Aug 20 '14 at 21:04
  • Yeah it might be a bit harsh. My memory of it was that it was underwhelmed. I was considering re-writing it but I was just too busy with other things. – hookenz Aug 20 '14 at 21:20
0

You could mount your drives with RAID 01 on your new server, this will allow you to :

  • Use every data like it's on a single drive
  • Data repartition on multiple drive is managed by the filesystem itself (RAID 0)
  • Having backup managed by the filesystem itself, each data is replicated and restored if an error is detected (RAID 1)
  • Having a great performance while RAID is managed by the operating system itself or the motherboard if it's integrated on it (many motherboards have this feature aviable)
  • Your filesystem can be easily extendable while extending a partition on new drives with RAID is easy
  • Migrating data will be easy because RSync will act like it's a simple drive to drive synchronisation
redheness
  • 216
  • 1
  • 7
0

First of all, I do not see the logic in distributing them on different drives unless they are different arrays of different disks and there is an improvement of performance. If they are part of the same disks/array you will just complicate things for nothing.

My file server had little over 2TB but they were 4 million files and over 250k folders.

I made an initial copy using a file manager (Multi-Commander), another refreshed sync in the same way just before the switch of the server. Practically, the initial copy took 4h and the refresh only a few minutes since most of the files were already transfered. The switch was made with minimum downtime.

Synkron is also a good tool for windows but I did not test it for extremely large amounts of data.

Overmind
  • 2,970
  • 2
  • 15
  • 24