1
While copying larger files from one location to a Linux testing-server, I keep getting disconnects around the 1.7-1.9GB area. The protocol used makes no difference (SFTP/SCP/SMB).
My guess is that the target server is unable to allocate the blocks on disk fast enough, leading to a bottleneck in either storage or CPU, which causes the disconnect. By the time the target system has caught up, the source system has already thrown an exception. I seem to be able to confirm this by throttling the maximum transfer speed to about 15k kb/s (which it is currently running on).
I was wondering if anyone has any tips on how to prevent this issue from occurring. I'm thinking something along the lines of playing with buffers or longer timeouts on either host or target system. Maybe a third-party copy client? (I really miss not having native rsync on Windows)
Anyone has any tips on this? The most convenient means of copying is using a Windows system to a samba server, which is what I'm trying to accomplish.
EDIT: Not to overly grow the comments below, another observation that would support my idea of a resource bottleneck. If I specifically pause my copy-client every few 100 megs or so (and presumably allow the target to catch up), the copy goes through. I guess I'll have to figure out how to do that packet capture (of a 2GiB copy?!). Maybe playing with nice/ionice could do the trick, de-prioritizing smbd.
1"My guess is that the target server is unable to allocate the blocks on disk fast enough, leading to a bottleneck in either storage or CPU, which causes the disconnect." unlikely, that should just result in slow transfer... What is a "disconnect" - i.e: physical link drops, TCP connection, etc...? What NICs are you using, and are they reliable? How have you connected the two machines? Can you test between each of these and a third machine to try and identify which is problematic? – Attie – 2019-05-26T11:19:07.177
I would run vmstat on the Linux end and see what resources are in use during the transfer. I speculate wildly there may be an issue with sawp - I mogjt trybsetting vm.swappiness very low (like 10 or even 0) another possibility -and ive seen this - was an ISP running some kind of DOS protection which saw I was sustaining a high speed, determined it was a DOS attack and null routed the connection for 20 mins. – davidgo – 2019-05-26T11:48:32.217
@Attie Standard copy simply gives me a read/write error after about 1.7+ GB. More specialized tools like FastCopy or WinSCP will complain that no response was received from the target for over 15 seconds (while the target server CPU load maximizes a single core). In the case of FastCopy, simply telling it to wait longer will allow it to resume the copy-operation once the connection is re-established (at which point the CPU load will drop back down). In all cases, a simultaneous SSH session will remain connected (although responding slaw as expected due to CPU load). – Mark – 2019-05-26T12:30:41.777
@davidgo No activity on swap whatsoever, it's a fairly lean system. But trying to allocate, encrypt, store and checksum every block twice takes a punch out of the poor little Turion. I had hoped it would act like a Windows system in this case and (as Attie suggests) just slow down to a crawl until it is done. Perhaps it prioritizes CPU and IO resources over Network, thus causing a delay in the ACK packages resulting in Windows thinking it disconnected. Linux-Linux rsync will simply slow down and wait, small files store in the buffer and write from there, pausing in between to clear the buffer. – Mark – 2019-05-26T12:40:22.430
@Mark, I've noticed that issue when pushing a copy from Windows to Ubuntu over smb, but going to the Ubuntu box and requesting the file from Windows succeeds, as well as copying a file from Ubuntu to Windows. Sorry, just a workaround, and I have no explanation why one fails and the other doesn't. – DrMoishe Pippik – 2019-05-26T15:27:23.733
I've been copying testfiles successfully with the speed capped at 15k kb/s for hours now. As soon as I uncap the speed, the issue returns. Also, copying the same files to an i5 system at the same update-level has no issues at high speeds. It has to be a resource limitation. Given that, I would still consider it a bug of sorts. Samba should be able to just tell its client to hold on for a bit while it's busy instead of rudely dropping packets. I'll keep the issue open just in case someone knows a workaround that doesn't force me to slow it down to snail-speed. Thanks for the feedback, guys! – Mark – 2019-05-26T17:48:11.880
What I'd normally look at in these situations is a pcap of a failure. If you can reproduce it every time, you'd want to do so while capturing. The moment where things start to fall apart should be pretty apparent, but you can work backwards from the disconnect, paying attention specifically to errors noted by SMB and TCP prior to and during the disconnect. I'd also pay attention to window size up until that point. You can get additional information by looking at logging on the server at the time the disconnect occurs. – MaQleod – 2019-05-27T03:46:33.540