I've been given access to a share on a Windows Server 2003 SP1 system (10.a.bbb.ccc) which is a file and printer server, and regularly large files get copied to that share. However, occasionally such a copy fails. When reproducing this issue using Robocopy (on 10.xxx.yy.zzz), I get something like
70.4%
2013/07/31 11:20:21 ERROR 64 (0x00000040) Copying File <<file name removed>>
The specified network name is no longer available.
Waiting 30 seconds... Retrying...
New File 105.2 m <<file name removed>>
0.0%
dumpcap
+ Wireshark show that when this happens, in the middle of copying suddenly the server does not accept any data anymore on TCP port 445, by setting the window size to zero:
No. Time Source Destination Protocol Length Info
7303 5.841186000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7304 6.149715000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 55 [TCP ZeroWindowProbe] [TCP segment of a reassembled PDU]
7305 6.150137000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindowProbeAck] [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7306 6.749711000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 55 [TCP ZeroWindowProbe] [TCP segment of a reassembled PDU]
7307 6.750087000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindowProbeAck] [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7308 7.946779000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 55 [TCP ZeroWindowProbe] [TCP segment of a reassembled PDU]
7309 7.947130000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindowProbeAck] [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7310 10.349783000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 55 [TCP ZeroWindowProbe] [TCP segment of a reassembled PDU]
7311 10.350201000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindowProbeAck] [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7312 15.149910000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 55 [TCP ZeroWindowProbe] [TCP segment of a reassembled PDU]
7313 15.150283000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindowProbeAck] [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7314 24.747096000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 55 [TCP ZeroWindowProbe] [TCP segment of a reassembled PDU]
7315 24.756210000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindowProbeAck] [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7316 43.958531000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 55 [TCP ZeroWindowProbe] [TCP segment of a reassembled PDU]
7317 43.958863000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 60 [TCP ZeroWindowProbeAck] [TCP ZeroWindow] microsoft-ds > 57918 [ACK] Seq=10864 Ack=6973070 Win=0 Len=0
7318 75.216401000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 54 57918 > microsoft-ds [RST, ACK] Seq=6973070 Ack=10864 Win=0 Len=0
7319 75.225543000 10.xxx.yy.zzz 10.a.bbb.ccc TCP 66 55972 > microsoft-ds [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
7320 75.225933000 10.a.bbb.ccc 10.xxx.yy.zzz TCP 66 microsoft-ds > 55972 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=1 SACK_PERM=1
So after 70 seconds the client (here: Robocopy) calls it quits.
My question: Is this a known issue with Windows shares? What can be investigated/debugged/traced on the file server? Are there specific settings we need to look at or experiment with?
Thanks in advance!