10
1
Ok, there's a bit more to the story than the title implies.
Background and Environment: I'm copying several TB from an older Ubuntu server to a newer Windows 2012 server over SMB. (Technically, it's commodity hardware, but they're servers around here.) Everybody is on a gigabit LAN, and the older Ubuntu box has a bonded interface. I believe the Ubuntu server has two Rosewill PCI-e 1x ethernet cards and the Windows server has one reasonably nice PCI Intel ethernet card.
The destination computer (the Windows server) is running a Storage Pool with parity over 4x 2TB drives. It is running Microsoft's new ReFS. The source computer (the Ubuntu server) is running a software RAID mirror. It is running good ol' EXT4.
The two servers are running through a single gigabit switch. I have experimented with breaking the bonding on the source (Ubuntu) computer without any improvement.
Problem: I have no trouble transferring at reasonable speeds from other computers to the Windows server. Other computers can hold 50-80MB/s without much difficulty, but transferring from that Ubuntu server tops out at no more than 20MB/s. 4+TB at 20MB/s takes a long time (something like 2.3 days), and I'm wondering what I can do to figure out where the bottleneck is.
Symptoms: CPU on both computers is pretty minimal, and certainly not prohibitively busy. Hard drives on both computers are active but not swamped, and CPU IOwait is almost 0% on at least the Ubuntu server.
I did a Wireshark trace for 35 seconds (presumably long enough to make sure all ACKs were for new packets) and noticed that there were quite a few things I didn't expect. (1) There weren't any checksums for the ACKs (and SOME SMB packets) from Windows to Ubuntu. However, Wireshark claims that this may be due to "IP checksum offload." Ok, I have a pretty nice card in there. I suppose it is possible that the network card could do checksum calculations. Fine. Moving on... (2) "TCP ACKed unseen segment." This one I have a problem with. The ACK number is within an acceptable range from what I can tell, and there are often huge blocks of these messages. Perhaps Wireshark is just too slow?
Summary: Transfer speed sucks (20MB/s over gigabit ethernet) and I don't know why. Wireshark claims Windows is ACKing things that were never sent by Ubuntu.
Guesses: My initial guess is that the cheaper Rosewill cards are getting swamped. My second guess is that the software RAID-like things on one end or the other is getting inundated with stuff to do.
2What speeds do you get copying from the Ubuntu server to one of the desktops (not Server 2012)? Perhaps WinXP or Win7? I've had big problems with packet signing and encyrption with SMB with Server 2008 and up. – Dom – 2013-08-22T06:39:36.943
Update: I ended up having to reboot (thanks to a kernel panic). Unfortunately the system now has a kernel panic on every boot. I whipped out my trusty copy of Knoppix and mounted the drives, and everything is now fine and dandy. Now I'm copying over SSH and I still don't know where the bottleneck is.
sshd
is eating up 60% of one processor on the Knoppix side. In any case, my transfer is nearing completion.@Dom: Now that you mention it, I don't recall putting all that data on there much faster than 30MBps in the first place. – Andy – 2013-08-23T00:02:28.297
2@LorenzoVonMatterhorn, please avoid using URL shorteners. – Cristian Ciupitu – 2013-10-20T20:39:52.800
Are you sure it is not an issue with your disks? – MariusMatutiae – 2013-11-04T17:44:08.233
2Windows implemented a much fast version of the SMB protocol (SMB 2) over the past 4-5 years that is much less chatty and more efficient on the wire. I don't know off hand when those changes rolled into Samba, but it sounds like the older Ubuntu has an older Samba and perhaps the Knoppix has a newer version. – uSlackr – 2013-12-13T14:22:04.090
what kernel version uname -r and samba version are you using? – cybernard – 2014-03-17T17:29:18.660