Large files copy to NAS at 500 MB/s if indirectly routed via 2nd PC, but < 100MB/s if copied directly to NAS. How do I troubleshoot

0

I'm testing network reliability and found this anomaly, no idea how to troubleshoot it.

The network is stripped down to (PC + 10G card) -> (10G switch) -> (file server with 10G card).

The NAS and switch look fine - I'm getting close to 500 MB/sec sustained for large file copies to/from the NAS, with another PC through the same switch. The NAS is FreeBSD with a ton of fast RAM, and NVMe ZIL+L2ARC, there's nothing else loading it (another pool is resilvering one disk, but the pool used for this is idle). The PC is a hexacore i7 extreme and the test file is a single 100GB file on a newly wiped and formatted Samsung Pro 840 SSD, that delivers >= 80-95k IOPS (400+ MB/s) on both R+W. The SSD was benchmarked earlier today and the memory (64GB) was memtest86'ed for 36 hours straight over the weekend. The NCs are all Chelsio T4 series SR optical with Finisar transceivers, all believed good, or at least no known issues. The LAN is all jumbo enabled.

The NAS 10G card and switch seem happy, because other devices use that speed. The switch reports both machines have a good 10G link. Windows on the PC reports the NIC is connected at 10G. Get-SmbConnection confirms the link is using SMB3.02.

Task manager and NAS both agree that the wire or disks aren't otherwise in use, apart from the single copy of the 100G file from SSD to NAS, and the ethernet links are all 10G and good.

However they also both agree the file is averaging a steady 912 - 920 mbps - typical 1G speed. They aren't explaining why. It's probably something to do with the PC or NIC, not the other gear, because everything else seems established and happy, but that isn't much help. I've changed switch ports, no effect.

But this is the really crazy part:

  • Log in on PC and open 2 explorer windows, one to the SSD and one to the NAS. Copy file. Speed: 95 - 105 MB/sec.
  • Log in on 2nd PC connected to the same switch. Open 2 explorer windows, one to the SSD on PC as a shared drive and one to the NAS. Copy file. This time it has to get the file remotely from the PC and copy it, because it isn't on local SSD. Speed: 500 MB/sec.

The PC will feed the file at 500 MB/sec to another PC which feeds it to the NAS at 500 MB/sec. But the PC will only feed it to the NAS at 100 MB/sec. It's 5 times faster when it's routing it through another PC as a shared network drive, than when it's directly copying!

There is only a single network link for PC-> LAN and NAS-> LAN, all 3 devices are on the same 10G switch. At a stroke, this seems to rule out any issue with Windows, with the networking hardware, or with the the disks, or ... well, apparently everything I can think of.

Windows doesn't seem to question the fact there's a 10G link but only 1G actual data speed being used. Wireshark doesn't seem to say much either. Windows is reporting it as 10G and it's sending data at 500 MB/sec to the other PC, which can in turn send at 500 MB/sec to the NAS, but traffic direct is 1G speed.

How the heck do I troubleshoot this?

Stilez

Posted 2017-11-23T00:50:53.190

Reputation: 1 183

This is actually not how SMB works. When you mount a remote share and copy it to another share on the same server it uses a remote copy command instead of downloading and uploading it back which explains your speed increases. Why however it drops on the first PC is still confusing. – jdwolf – 2017-11-23T05:29:54.763

The source and dest shares are on different servers so I don't think it can be using remote copy: (1) logged in on PC1 and copy from (PC1 drive D as local drive) to (NAS:samba share) speed = 100 MB/s. (2) logged in locally on PC2 and copy from (PC1:drive D as network share) to (NAS:samba share), speed = 500 MB/s. PC1, PC2 and NAS are all plugged into the same 10G switch, and all hardware and networking is easily capable of 500 MB/s as set up. – Stilez – 2017-11-23T09:26:22.850

No answers