16

This issue has been extremely frustrating for us: when transferring a large VHD (virtual hard disk) file from a Windows 7 machine over the network to a physical Windows Server 2008 machine at our datacenter, the windows file transfer fails at 4 GB consistently. We have a direct 100 mbit connection from our main office to our data center.

When the transfer fails, the error message we receive is:

There is a problem accessing \\server-name\d$ Make sure you are connected to the network and try again.

It is only VHD files larger than 4 GB that fail. If we send any other file type, it works fine. If we zip the VHD, that also works. Moreover, we can send a VHD the other direction (from the data center to the main office) no problem. It is just VHD files in that direction.

Important notes:

  • All partitions are NTFS!!
  • There is no firewall between workstation and server
  • We have tried disabling the antivirus on the workstation (no antivirus on server)
  • We have tried transferring the file from a machine not on the domain
  • We have tried transferring the file from a Ubuntu machine (still fails but at around 450MB instead of 4GB)
  • Wireshark capture shows 40 DUP ACKs when transfer fails
  • Xcopy and Robocopy (with restart flags) both fail (same point)
  • FTP transfer fails at 4,14X,XXX,XXX bytes and cannot be restarted at that point
  • We tried changing the file extension (stupid, but a last resort) to something other than vhd before sending it, but it still failed
  • Connection is as follows: Dell Workstation (Main Office) -> Dell PowerConnect 5448 Managed Switch (MO) -> HP Procurve 2910al-24G Layer 3 Router (MO) -> 100Mb TLS link -> HP Procurve 2910al-24G Layer 3 Router (Data center) -> Dell PowerConnect 5448 Managed Switch (DC) -> Dell Server (DC)

So basically, it is JUST vhd files > 4GB, from our main office to our datacenter that fails. This all just doesn't add up... at this point I believe it is a issue with our network hardware settings, but I don't understand what the difference is between transferring a large VHD (which fails, at 4GB) and a large video file (which works always).

Isaac Butt
  • 251
  • 2
  • 11
  • Did you try another protocol then CIFS/SMB? – Bart De Vos May 08 '12 at 22:31
  • No I haven't; I will give that a try – Isaac Butt May 08 '12 at 22:35
  • Maybe there is any type of thin/sparse file mechanism involved (file has all-zero blocks unallocated to have far smaller disk usage than its actual size)? One experiment you can do is locally copy it with any program that *actually* reads its source block by block (any dd or cat variant should do, do NOT rely on the normal "copy" command as it will probably use some file copying API!) and look at the size of the copy... – rackandboneman May 08 '12 at 22:40
  • I assume there are many all-zero blocks within the VHD that I am trying to copy (since when it zips to about 8GB from 23GB).. but why should that affect copying the file across the network? – Isaac Butt May 08 '12 at 22:56
  • In my experience, that error message indicates a networking problem. Can you perform a network capture during the time that it fails and see what the connection is doing? – wfaulk May 08 '12 at 23:07
  • Yes, I forgot to mention that I did a wireshark capture. At the time of failure, all I see is 40 DUP ACKs, then the transfer fails (Windows error pops up). I can transfer other large files (e.g. 12 GB video file) to the data center no problem. It's just VHDs and always at 4GB in. – Isaac Butt May 08 '12 at 23:13
  • All zero blocks are different from unallocated all zero blocks (IF vhd format can use these), which might confuse filesystem drivers on one end.. – rackandboneman May 08 '12 at 23:31
  • Ah okay. Good point. Is there a windows utility which will do what you suggested (dd or cat alternative)? – Isaac Butt May 08 '12 at 23:48
  • 4G is a indicator that it's a 32-bit problem. Not for sure, but likely. – Bill Weiss May 09 '12 at 03:38
  • It does, but like I said, we're dealing with NTFS and 64-bit OS's; I feel that the issue lies in the networking equipment and there is something about the VHD file (all zero blocks) that the equipment can't deal with. – Isaac Butt May 09 '12 at 16:20
  • I stood up an FTP server on the destination machine, VHD file transfers still consistently fail at 4,14X,XXX,XXX bytes (Could not write to transfer socket: ECONNRESET - Connection reset by peer).The FTP client tries to restart at that point but it can never get past that point. – Isaac Butt May 09 '12 at 17:35
  • Hrmm.... what kind of connection between offices? – SpacemanSpiff May 09 '12 at 18:49
  • 1
    Let me rephrase, what type of networking gear handles that 100Mb connection? – SpacemanSpiff May 09 '12 at 18:57
  • Any chance there's a Sonicwall involved? – Skyhawk May 09 '12 at 18:57
  • Dell Workstation (Main Office) -> Dell PowerConnect 5448 Managed Switch (MO) -> HP Procurve 2910al-24G Layer 3 Router (MO) -> Direct 100Mb line -> HP Procurve 2910al-24G Layer 3 Router (Data center) -> Dell PowerConnect 5448 Managed Switch (DC) -> Dell Server (DC) – Isaac Butt May 09 '12 at 19:07
  • No Sonicwall involved. – Isaac Butt May 09 '12 at 19:08
  • Are any of those layer 3 devices running ALGs or NAT? I'm out of ideas past that. – SpacemanSpiff May 09 '12 at 19:13
  • Forgot to mention the "direct 100Mb line" is a TLS link; there is no NAT'ing on the layer 3 devices. The workstation and server are on a separate VLANs but that shouldn't have any effect. – Isaac Butt May 09 '12 at 20:22
  • This could be caused by anti-virus software on the server. – Harry Johnston May 10 '12 at 02:26
  • Firewall is disabled for domain connections and there is no antivirus installed on the server. – Isaac Butt May 10 '12 at 17:09
  • Sometime ISPs will do strange things in order to abstract away the underlying media to provide you TLS. Have you considered talking to their support? –  May 10 '12 at 17:52
  • Yeah, my next step will be talking to our TLS provider. I will update this question with the results of that conversation. Thanks for the advice :) – Isaac Butt May 10 '12 at 17:57
  • Any chance it's disk corruption? – Bigbio2002 May 10 '12 at 19:26
  • @Bigbio2002 Have tried several different VHDs from several workstations to several different servers at the DC, same issue. – Isaac Butt May 10 '12 at 20:12
  • As a troubleshooting measure, is there any way to transfer such a file between two machines so that the TCP connection will transit one of the HP routers but not the TLS line? (Obviously if possible you'd want to do this test for each of the two routers.) – Harry Johnston May 10 '12 at 22:04
  • Failing all else, I guess you could set up two test machines, establish that the transfer works when using a simple cable, then plug them directly into the two ends of the TLS. That should establish definitively that the TLS is to blame. But since that would involve an extended network outage I'm guessing that will be a last resort. :-) – Harry Johnston May 10 '12 at 22:05
  • 2
    Presumably if deep-packet inspection is to blame (which seems likely) using an encrypted transfer mechanism such as SFTP or SCP would work around the problem. Or you could use IPSec, which is built into Windows. Or perhaps the routers have some kind of encrypted tunnel support? – Harry Johnston May 10 '12 at 22:10
  • @HarryJohnston Great suggestions! I will definitely try these steps :) thank you – Isaac Butt May 11 '12 at 16:52
  • 2
    @HarryJohnston After setting up SFTP, VHD files transfer successfully, so it looks like you were right about DPI on the TLS. I will talk to our provider and see if there is something they can do about it :) – Isaac Butt May 11 '12 at 18:54

6 Answers6

3

After troubleshooting this for many hours (and trying all the suggestions posted here), the issue turned out to be the TLS link between our main office and the datacenter. I called our TLS provider and after talking to several NOC technicians, one of them had heard of the exact issue before. It turned out that some of their layer 2 equipment was old and had issues with VHD data.

The solution was upgrading the firmware on these devices, which was performed by the TLS provider. We now have no issues transferring large VHDs. For those interested, our TLS provider is Shaw Communications in Victoria, Canada.

Isaac Butt
  • 251
  • 2
  • 11
1

Try Xcopy or Robocopy; at least one or both have a "resume" switch. Rsync may be of help, too.

Out of curiosity, is one of the machines 32-bit, but the other is 64-bit? If so, can you try your copy with a 64-bit machine temporarily.

gWaldo
  • 11,887
  • 8
  • 41
  • 68
  • Both Robocopy and Xcopy fail as well at the same point, even with the resume switch (and buffered/unbuffered). Both server and workstation are 64 bit. – Isaac Butt May 09 '12 at 16:15
  • Brutal. The only option that I can think of to remediate is to check the 2GB VHD option in ESX. My condolences. – gWaldo May 09 '12 at 16:53
  • No problem, I appreciate your help :) (we are using Hyper-V not VMWare) – Isaac Butt May 09 '12 at 17:32
  • Good point; I've used a bunch of virtualization platforms, so I mentally parse them as $disk_file or $config_file, etc... – gWaldo May 09 '12 at 18:48
0

Searching google for large file network copy failures and you'll find some threads talking about similar issues but not just vhd's. This KB is usually linked to see if tweaking NIC settings help. TCP offload, chimney settings, etc.

http://support.microsoft.com/kb/951037

Willy
  • 1
  • Thanks for the suggestions. I can transfer other large files no problem, but I will look into tweaking some of those settings. Disabling chimney offload has no effect. – Isaac Butt May 09 '12 at 18:28
0

Mmmmhhhh... I see the various answers above and I realize that I still can't tell if you really tried to copy with a 64-bit copy program. (xcopy, robocopy and most FTP clients are 32 bit, even on a 64 bit Windows.)

Can you give it a try with the 64-bit version of TotalCommander V8.0 ? (It is still a Release Candidate, but very stable.) That is truly 64-bit only.

Another thing to try if the server has IPV6 enabled (usually does on W2K8): Disable IPV4 completely on the workstation so the copy will have to use IPV6. Will be interesting to see if that makes a difference.

If neither of the above brings relieve.... You can allways use HJSplit (or the split function of TotalCommander) to split the file in 1GB chunks, but of course you must have a means of re-joining them on the server. That will depend on if you have access to run a program on the server itself. (Just "copy /b chunk1+chunk2+chunk3 total.vhd" will do if you are not allowed to install additional software server-side.)

Tonny
  • 6,252
  • 1
  • 17
  • 31
  • Tried TotalCommander 8, transfer fails even before 4GB in and reports "Please remove the write protection!" but I don't believe that actually indicates a write protection error. – Isaac Butt May 09 '12 at 19:52
  • We have other ways of moving the data across. I could just RAR the file and transfer that over (don't even need to split it into small chunks), but it is an extra step that we really shouldn't have to do. Thanks for the suggestion though, I appreciate your help. – Isaac Butt May 09 '12 at 19:53
0

Just a thought: Is the VHD in use by the hypervisor or mounted?

It could be failing because part of the VHD is locked and unable to be read from the filesystem. This is why zipping the file works and why video files of the same size also work, but not VHD files.

Looking for a file lock in windows:

  1. Download process explorer (Direct link to live.sysinternals.com)
  2. Select the Find Menu, choose Find Handle or DLL...
  3. Type the file name, select search.

There appears to be an experts exchange post with similar issues. But there are no resolutions in the answers.

Joseph Kern
  • 9,809
  • 3
  • 31
  • 55
  • Good point. Sometimes you even need to reboot the workstation to get it to really unlock the file. It may appear to be free, but you can never really tell. – Tonny May 09 '12 at 19:08
  • @Tonny You sure can tell, you just need the right tools. Updated my answer with a suggested method. – Joseph Kern May 09 '12 at 19:14
  • Yeah, I saw the expert exchange article and it sounds similar. The process explorer shows nothing for the file. Moreover, I can make a copy of it and trying to transfer the copy still fails so there doesn't appear to be a lock. Total Commander 8 RC (64 bit) fails as early as 2GB into the transfer with a message "Please remove the write protection!" though that is likely just a stock error response. – Isaac Butt May 09 '12 at 19:51
  • 1
    That TC response is actually useful. It will only give that message halfway through the copy if there is really something blocking the attempted write. This has to be on the server-side, or LAN/WAN related. Are you certain the LAN is really transparent? I would be looking for a router doing Statefull Packet Inspection, or a Network Accelerator device (E.g. Cisco WAAS appliance) that gets somehow confused about this particular type of data. – Tonny May 09 '12 at 21:48
  • Hmm, well the line is supposed to be transparent; I could call our provider and tell them what is going on, though I bet they will direct the blame elsewhere. – Isaac Butt May 10 '12 at 16:53
  • Did you maybe for any reason try to feed the vhd into vmware converter recently? I've seen it keep hard-to-get-rid of locks on images that it refused to convert. – rackandboneman May 10 '12 at 17:43
  • We are not using VMware, only Hyper-V. I don't believe this is a file lock issue. – Isaac Butt May 10 '12 at 18:01
0

This sounds like it might even be a permissions issue, when you try to copy the file to the network location it gets stopped or fails, perhaps you could try to create a network folder make it fully open, meaning shared to the "Everyone" Group and also set that way in the security tab. If that fixes the problem, then it looks like a permissions issue, in fact since you mentioned the Linux copy failed sooner, it seems that permissions might be the problem. Make sure the files inside the VHD are not in use and you have proper permissions to access them.

Also make sure the folder you are copying from has open permissions. Remember this is just to see if the permissions are getting in the way, you can always tighten them up later once you get a staring point of the copy working properly.

Another thing and it might be a long shot, but have you tried updating the NIC drivers? Perhaps there might be a fix in the most recent driver for your machine.

I hope this helps out, Cheers

Frank R
  • 141
  • 3
  • Thanks for the suggestion, but that doesn't explain why the file transfer is successful if the data in encrypted. I still think the problem lies with the TLS line; I am in talks with their support at the moment – Isaac Butt May 14 '12 at 23:09