Why is there a write speed difference between dd, cp, rsync and macOS Finder to a SMB3 drive?

15

4

Tl;dr – We can't find the reason for the limited write speed of 60 MB/sec to our NAS via SMB and AFP from two different Mac clients. In comparison: An old Windows 7 laptop in the same network writes steady 100 MB/sec.

If you read this question for the first time, please skip to the Update 4 section. rsync is the main reason for the low speed, even though we don't understand why (for a single file!).


Original Question: Find speed bottleneck SMB3/NAS with Mac OS 10.11.5 and above

We tested via rsync --progress -a /localpath/test.file /nas/test.file on macOS and the copy info of Windows.

The NAS is a DS713+ running their current DSM 6.0.2 (tested with 5.x too), with two HGST Deskstar NAS SATA 4TB (HDN724040ALE640) in RAID1 with only gigabit ethernet components and new ethernet cables (at least Cat5e).

Mac clients first only made 20 MB/sec. But applying the signing_required=no fix (described here) pushed the write speed to 60 MB/sec via SMB2 and SMB3. AFP also delivers around 60 MB/sec. Result varies around 5 MB/sec depending on protocol and (Mac) client.

What we've already tried:

Network

  1. Tested network performance via iperf3. Result: 926 Mbit/s. Looks good.
  2. Tried Dual Link Aggregation/Bonded network interfaces. No change.
  3. Increased MTU to 6000 and 9000. No change.
  4. Checked all cables. All fine at least Cat5e, in good condition.

Disks

  1. Checked S.M.A.R.T. Looks healthy.
  2. Tested write speed directly to disk with dd if=/dev/zero of=write.test bs=256M count=4 with various bs and count settings (128/8, 512M/2, 1024/1). Result: around 120 MB/s (depending on block size/count)

SMB/AFP

  1. Benchmarked SMB2, SMB3 and AFP against each other. About equal.
    See update below: Used wrong method to rule out the SMB implementation of macOS. SMB on Windows is faster, new SMB settings coming with macOS 10.11 and 10.12 may be the reason.
  2. Tried to tweak the SMB settings, including the socket options (following this instruction)
  3. Tried different version of delayed ack settings and rsync --sockopts=TCP_NODELAY (comments)

No significant change of the write speed. We double checked that the config was really loaded and we were editing the right smb.conf.

System

  1. Watched CPU and RAM load. Nothing maxes out. CPU around 20%, RAM about 25% during transfer.
  2. Tested the same NAS with DSM 5.x.x in a nearly out-of-the-box setup. No additional software installed. Note: We have two of these in different locations. They are in sync via Synology's CloudSync. Same result.
  3. Deactivated everything unnecessary which could draw system resources.

We think this is a rather default setup, no fancy adaptions, clients or network components. According to the metrics Synology publishes the NAS should perform 40 MB/s to 75 MB/s faster. But we just can't find the bottleneck.

Clients/NAS

The Mac clients are a MacPro 5,1 (standard wired NIC, running 10.12.3 (16D32)) and a MacBookPro10,1 (Thunderbolt network adapter, running 10.11.6) only about 2m cable away from the NAS, running over the same gigabit switch as the Windows laptop in the test.

We have two of these NASes in different locations and the results are identical. The seconds NAS is more or less factory default (not even 3rd party software installed). Just two RAID1, EXT4 formatted disks syncing to the other NAS via Synology CloudSync. We've gone as far as directly connecting to the NAS without the switch, same result.

Important Update

The method used to rule out the SMB implementation of macOS/OS X was wrong. I tested it via a virtual machine, assuming it would use its own version of SMB, but obviously the traffic gets handed to macOS, running through its version of SMB.

Using a Windows laptop I've now been able to achieve an average of 100 MB/s. Indicating the SMB implementation/updates coming with 10.11 and 10.12 may cause the poor performance. Even if signing_required is set to no.

Would be great if someone could point out some further settings that may have changed with the updates and could affect the performance.

Update 2 – new insights

AndrewHenle pointed out in the comments that I should investigate the traffic in detail using Wireshark for more insight.

I therefore ran sudo tcpdump -i eth0 -s 65535 -w tcpdump.dump on the NAS, while transferring two test files one with 512 MB and one with 1 GB. And inspected the dump with Wireshark.

What I found:

  1. Both OS X and Windows seem to use SMB2 although SMB3 is enabled on the NAS (at least according to Wireshark).
  2. OS X seems to stick with the MTU. The packets have 1514 bytes leading to way more network overhead and sent packets (visible in the dumps).
  3. Windows seems to send packets up to 26334 bytes (if I read the data correctly! Please verify.) even if the MTU shouldn't allow that, since it's set to 1500 on the NAS, the maximum setting would be 9000 there (Synology also uses the 1500 setting in their tests).
  4. Trying to force macOS to use SMB3 by adding smb_neg=smb3_only to the /etc/nsmb.conf didn't work or at least didn't lead to faster transfers.
  5. Running rsync --sockopts=TCP_NODELAY with various combinations of TCP delayed ack settings (0 to 3) had no effect (Note: I ran the tcpdump with the default ack setting of 3).

I've created 4 dumps as .csv files, 2 while copying 512 MB (test-2.file) and 2 while copying 1024 MB (test.file). You can download the Wireshark exports here (25.2 MB). They are zipped to save space and named self-explanatorily.

Update 3 – smbutil output

Output of smbutil statshares -a as requested by harrymc in the comments.

==================================================================================================
SHARE                         ATTRIBUTE TYPE                VALUE
==================================================================================================
home
                              SERVER_NAME                   server-name._smb._tcp.local
                              USER_ID                       502
                              SMB_NEGOTIATE                 SMBV_NEG_SMB1_ENABLED
                              SMB_NEGOTIATE                 SMBV_NEG_SMB2_ENABLED
                              SMB_NEGOTIATE                 SMBV_NEG_SMB3_ENABLED
                              SMB_VERSION                   SMB_3.0
                              SMB_SHARE_TYPE                DISK
                              SIGNING_SUPPORTED             TRUE
                              EXTENDED_SECURITY_SUPPORTED   TRUE
                              LARGE_FILE_SUPPORTED          TRUE
                              OS_X_SERVER                   TRUE
                              QUERYINFO_NOT_SUPPORTED       TRUE
                              DFS_SUPPORTED                 TRUE
                              MULTI_CREDIT_SUPPORTED        TRUE

--------------------------------------------------------------------------------------------------

Note on this: I'm sure SIGNING_SUPPORTED being true here doesn't mean the setting in the config doesn't work. But only that it is supported by the NAS. I've triple checked that changing the signing_required setting in my config has an effect on the write speed (~20 MB/s when tuned on, ~60 MB/s when off).

Update 4 – Samba Wars: A New Hope

It feels somewhat embarrassing, but the main problem here – again – seems to be the measurement.

Turns out rsync --progress -a costs about 30 MB/s of writing speed. Writing with dd directly to the SMB share and using time cp /local/test.file /NAS/test.file are faster at about 85-90 MB/s and apparently the fastest way to copy is the macOS Finder at around 100 MB/s (which is also the method hardest to measure, since there is no timing or speed indicator – who needs that, right? o_O). We measured it by first copying a 1 GB file and then a 10 GB file, using a stopwatch.

What we've tried since the last update of this question.

  1. Copy from Mac client to Mac client. Both have SSDs (MacPro writes with 250 MB/s to own disc, MacBook Pro with 300 MB/s). Result: A meagre 65 MB/s via dd writing from MacBook Pro to MacPro (rsync 25 MB/s). Seeing the 25 MB/s was the moment when we started questioning rsync. Still 65 MB/s are extremely slow. So the SMB implementation on macOS seems… well, questionable.
  2. Tried different ack settings with dd and cp – no luck.
  3. Finally we found a way to list all the available nsmb.conf options. It is a simple man nsmb.conf. Caution the online version is outdated!

So we tried a few more settings, among them:

notify_off=yes
validate_neg_off=yes
read_async_cnt=16
write_async_cnt=16
dir_cache_async_cnt=40
protocol_vers_map=4
streams=no
soft=yes

Note: smb_neg=smb3_only is – as I already expected – not a valid setting. protocol_vers_map=4 should be the valid equivalent.

Anyway, none of these settings made a difference for us.

New questions at a glance

  1. Why is rsync such an expensive way to copy one (1!) file. There isn't much to synchronize/compare here, is there? The tcpdump doesn't indicate possible overhead either.

  2. Why are dd and cp slower than the macOS finder when transferring to a SMB share? It seems when copying with Finder there are significantly fewer acknowledgements in the TCP communication. (Again: The ack setting e.g. delayed_ack=1 made no difference for us.)

  3. Why does Windows seem to ignore the MTU, sending significantly larger and therefore fewer TCP packets, resulting in the best performance, compared to everything possible via macOS.

This is what the packets look like from macOS (constantly 1514)

"TCP","1514","[TCP segment of a reassembled PDU]"
"TCP","66","445  >  56932 [ACK] Seq=6603 Ack=35239 Win=4505 Len=0 TSval=520980697 TSecr=650208630"

And this coming from Windows (up to 26334, varying in size)

"SMB2","1466","Write Request Len:65536 Off:196608 File: test.file"
"TCP","26334","[TCP segment of a reassembled PDU]"
"TCP","7354","[TCP segment of a reassembled PDU]"
"TCP","54","445  >  49220 [ACK] Seq=6831 Ack=267030 Win=4074 Len=0"

You can download full .csv here (25.2 MB), the file names explain what has been copied (OS, transfer method and file size).

woerndl

Posted 2017-01-23T16:16:38.860

Reputation: 475

SMB doesn't get handed by VMs to host's OS, VMs perfectly emulate a real computer and are unaware of being virtualized. However, virtualization introduces some overhead and by necessity VMs pass all their network communication through host, which may be suboptimal too. – gronostaj – 2017-01-24T07:45:56.140

@gronostaj thats what I thought too. But I think the write speed results are too similar for a coincidence, both very close to 60 MB/s. The "real" Windows laptop on the other hand made 100 MB/s in various runs. But the VM performance isn't the core aspect of the problem anyway. My tests suggest that the current OS X SMB implementation introduced settings (probably with 10.11 and 10.12) that severely slow down the SMB connections. But I don't have a clue where to look next, besides turning of signing. – woerndl – 2017-01-24T08:48:19.027

Using a Windows laptop I've now been able to achieve an average of 100 MB/s. Indicating the SMB implementation/updates coming with 10.11 and 10.12 may cause the poor performance. While possibly true, there are also a lot of other differences between this Windows laptop and your OS X installation(s) that are only getting 60 MB/sec. Network drivers, network settings, hardware and a lot more could also be contributing. It doesn't take much to knock performance from 100 MB/sec - which is just about the limit of gigabit ethernet - down to 60 MB/sec. – Andrew Henle – 2017-01-24T11:23:54.543

@AndrewHenle absolutely. I'll have to add that we've tried this with two different Macs (MacPro 5,1 and MacBookPro10,1) and two identical NAS. Producing the same results. Connected directly, even without other network components like switches between. Making it less likely that e.g. the network hardware of either Mac or the drivers are responsible. But I'm very open to any suggestions to narrow the problem even further. – woerndl – 2017-01-24T13:04:19.683

@awenro Can you capture at least the packet sizes and timings for the transfers from the faster Windows laptop and the slower OS X machines? A difference there would at least give you some data to start with. Just a hunch, but what's the OS X setting for Nagle's algorithm/delayed TCP ack compared to the Windows laptop? This might be relevant: http://www.shabangs.net/osx/speed-up-macos-x-file-transferring-over-network/

– Andrew Henle – 2017-01-24T13:22:38.093

@AndrewHenle Can you capture at least the packet sizes and timings for the transfers from the faster Windows laptop and the slower OS X machines? I'd be glad to, what way would you recommend to track this for comparable results on both platforms?

What's the OS X setting for Nagle's algorithm/delayed TCP ack the setting was on 3, as stated in the article. I've tried to change it to all modes, disconnected, reconnected but had no luck. It had an effect though, cause in mode 2 the auto discover of the NAS in the network section of Finder didn't work. I had to manually add it. – woerndl – 2017-01-24T14:21:30.713

@awenro Can you run something like Wireshark on your NAS? Also, this might be interesting: http://www.stuartcheshire.org/papers/NagleDelayedAck/ I wonder what a test of rsync --sockopts=TCP_NODELAY ... shows, with various combinations of TCP delayed ack settings.

– Andrew Henle – 2017-01-24T14:34:06.360

@AndrewHenle I've tried to deliver all the data you pointed out I should gather. Hope that it contains everything needed. Thank you very much for pointing this method out – I've never used Wireshark and tcpdump so far. The results look quite promising. – woerndl – 2017-01-24T19:18:53.607

Try to add signing_required=no and smb_neg=smb3_only to both /etc/nsmb.conf and ~/Library/Preferences/nsmb.conf. And what is the output of smbutil statshares -a? – harrymc – 2017-01-25T18:57:31.803

@harrymc, thanks. I've added the output to the question. ~/Library/Preferences/nsmb.conf doesn't exist on my system. But I'm sure settings in /etc/nsmb.conf get applied to the connection. What I'm not sure about is if smb_neg=smb3_only is even a valid setting, since I couldn't find a documentation of possible Samba settings for macOS – I tried it though since smb_neg=smb2_only seams to exist. – woerndl – 2017-01-25T19:08:55.110

This seems like a bug with 10.11.5 - this note says it's ok in 10.11.4. I have also seen claims that the problem is still in 10.11.6. Have you tried cifs?

– harrymc – 2017-01-25T19:55:57.587

@harrymc What's described there as a bug probably relates to Apple turning on signing_required, don't you think. This might well be a bug, but it hasn't (just) to do with Apple turning on signing for the SMB traffic by default. What I'd be especially be interested in is if I've read the package sizes in the tcpdump correctly. Cause if macOS really has so much network overhead, this might be the reason. I wanted to try to connect via cifs:// to the share, but couldn't connect – I'll try again. – woerndl – 2017-01-26T08:03:23.507

Just a basic test, that probably you have done. Did you try to upload files from the two lazy clients via SMB among them or to a window computer and see which is their maximum speed? Just to exclude that there is something in the interaction between NAS software and SMB of the 2 clients. – Hastur – 2017-01-26T12:16:35.907

@Hastur thanks, we tried that this morning and the result is sobering, just 65 MB/s from SSD to SSD via SMB. The implementation of SMB in macOS must still be terrible. But I've updated the question with a lot new insights on top. – woerndl – 2017-01-26T13:40:53.687

Why Samba? Unix/Linux should use NFS - that should perfom better. https://www.cyberciti.biz/faq/apple-mac-osx-nfs-mount-command-tutorial/

– Michael D. – 2017-01-26T13:48:23.443

On thing I noticed on my nas is, that it performs better with more free memory (see free -m) – Michael D. – 2017-01-26T13:51:31.623

@MichaelD. thanks, SMB is the new default on macOS replacing their own legacy protocol AFP. NFS is certainly possible, but not the official recommendation – afaik. About the RAM: Thanks for the command, didn't know free -m. But our RAM (1 GB, currently) doesn't max out by far while copying. On the NAS the only thing that takes up "100%" are the Volumes – if we use a transfer method that reaches the ~100 MB/s. – woerndl – 2017-01-26T13:54:50.880

have you tried to rsync data the other way? Pulling data from the macbook to the nas from the nas' shell? (just to see how fast this is going) -have you tried to connect a linux or windows machine to the nas and did same samba copies/measures there? – Michael D. – 2017-01-26T14:01:18.973

So you have a good argument to cut that the problem is in the interaction between NAS and macOS SMB implementation. It seems to be in SMB implementation. Since you are so deep in your testing I should propose to check (& report) the rsync max speed without SMB between the 2 clients. Moreover you can take a bootable usb with Linux and see if it works better with SMB... BTW can you directly mount that directory cifs under macOS (mount -t cifs //IP_REMOTE/C -o vers=2.0,username=XXX,password=YYY /local/path)? In the latter case you may need to use explicitly -o vers=2.0 or whatever... – Hastur – 2017-01-26T14:04:15.573

Note that to mount with the password in the commandline is dangerous (for the security) and you should do only for test (after you can set the password files...) on a safe machine. – Hastur – 2017-01-26T14:06:29.883

Hmm, I don't understand the size of the tcp packets in the windows dumps. "106678","23.031994","192.168.0.185","192.168.0.10","TCP","26334","[TCP segment of a reassembled PDU]" in the 1gb dump for example. This seems a lot bigger than most jumbo packets (about 9000 bytes). Can you check if there's any IP fragmentation going on (look at the details of one of these packets in wireshark). – Att Righ – 2017-01-29T20:07:14.437

are you using a remote rsync daemon? – rleir – 2017-02-04T11:31:55.763

what arguments do you pass to dd? – rleir – 2017-02-04T11:32:27.373

You noted a difference in MTU. That is configured in the network settings. See if you can increase it. Google "mac mtu settings". – rleir – 2017-02-04T11:39:23.357

Do you have example tcpdumps as .pcap files? It would be way more useful than those .csv's. Use tcpdump's -s 66 (snapshot length of 66 bytes) option to just capture the Ethernet, IP, and TCP headers of the packets so that it's not a huge file and so your file payloads aren't included (in case the contents of your test files are confidential). If you're generating random tests files, set the snaplen a little longer to capture SMB headers too. – Spiff – 2017-03-22T00:22:19.967

Did you do the tcpdump on the Windows machine that was running the test? If so, TCP Segmentation Offloading (TSO) could explain the large MTUs. tcpdump was probably showing you the large writes that Windows was sending down to the Ethernet NIC, but the Ethernet NIC was doing TSO to break that down into ~1500 byte (probably 1448 byte) chunks on the wire. – Spiff – 2017-03-22T00:24:17.370

rsync has a lot of options to minimize network bandwidth and ensure copy correctness when rsync is running on both devices. But that's not the case here; to rsync, these both look like local directories. It can't tell that one is remote via SMB. So rsync may be reading back the entire destination copy of the file to verify its checksum. It may be doing that as it goes, and even on full-duplex Ethernet, all those read requests have to share bandwidth with all the write operations, and all the read responses have to share bandwidth with all the write confirmations. – Spiff – 2017-03-22T00:31:24.147

Similar problem here: Very slow SMB on MacOS with 10G network. I'm afraid that Apple's SMB is just slow and we have to live with that. Other protocols are faster (FTP, NFS or rsync in client/server mode), but of course are not always practical for the task at hand.

– mivk – 2018-08-04T13:43:05.317

Answers

1

  1. similar question but has interesting answers, may be you can check this thread especially on comment 5: https://bugzilla.samba.org/show_bug.cgi?id=8512#c5

Here quote "Peter van Hooft". Althought i'm not sure testing base on what/which linux dist. and version of rsync too. However: 1. he give us a thought to try --sparse flag if could possible to increases performance. 2. he tested on NFS protocol but met same speed issue, therefore, IT maybe not the protocol (SMB2/3, AFP, etc.) reason.

We use rsync to copy data from one file server to another using NFS3 mounts over a 10Gb link. We found that upping the buffer sizes (as a quick test) increases performance. When using --sparse this increases performance with a factor of fifty, from 2MBps to 100MBps.

  1. Here is another a interesting inspection about rsync performance. https://lwn.net/Articles/400489/

LWN.net has a Conclusions that performance issue may relevant to kernel even the article posted on 2010 and we can't changed on NAS or MacOS. However, this article give us a thought that kernel issue may cause by checksum (my guessing) calculation.

One thing is clear: I should upgrade the kernel on my Mythtv system. In general, the 2.6.34 and 2.6.35-rc3 kernels give better performance than the old 2.6.27 kernel. But, tinkering or not, rsync can still not beat a simple cp that copies at over 100MiB/s. Indeed, rsync really needs a lot of CPU power for simple local copies. At the highest frequency, cp only needed 0.34+20.95 seconds CPU time, compared with rsync's 70+55 seconds.

also quote comments has this:

Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred

update 1: my mistake, this description is for --checksum. check it here: [Improved the description of the --checksum option.] PS, i don't have enough reputation to post more than 2 links.

but i can't find same description from rsync man page, so i am guessing someone is talking about below Bold :

Rsync finds files that need to be transferred using a "quick check" algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file's data does not need to be updated.

Therefore, compare to cp/tar/cat, if we use rsync to copy bunch of small or big files it could cause performance issue. BUT due to i'm not able to read source code of rsync therefore i can't confirm this is the ultimate reason.

my idea is keep checking:

  1. What rsync version is awenro using for testing? Could you update to latest version?
  2. let see what output when use --stats and -v with --debug=FLAGS

flags

--stats This tells rsync to print a verbose set of statistics on the file transfer, allowing you to tell how effective the rsync algorithm is for your data.

--debug=FLAGS This option lets you have fine-grained control over the debug output you want to see. An individual flag name may be followed by a level number, with 0 meaning to silence that output, 1 being the default output level, and higher numbers increasing the output of that flag (for those that support higher levels). Use --debug=help to see all the available flag names, what they output, and what flag names are added for each increase in the verbose level.

at the last, i would recommend read this supplemental post to get more knowledge.
"How to transfer large amounts of data via network" moo.nac.uci.edu/~hjm/HOWTO_move_data.html

Ou Steven

Posted 2017-01-23T16:16:38.860

Reputation: 11

Can you include the relevant information here? – bertieb – 2017-03-15T16:07:15.760

This may theoretically answer the question, but it would be preferable to include the essential parts of the answer here, and provide the link for reference.

– Stephen Rauch – 2017-03-16T16:23:29.090

0

Rsync/ssh encrypts the connection smb does not, if I remember correctly. If it's just one file then one system might be able to read that file and the other not. Also note that to have MTU above 1514 you need to enable "giants"/"Jumbo Frames" packets the fact that packets need to be further cut down may implicate that there is overhead to "repack" the packet. The second thing to note is that "giants"/"Jumbo Frames" need to be enabled on both ends AND EVERYTHING BETWEEN.

1514 is the normal Ethernet frames. 6k-9k frames are called giants or "Jumbo Frames" depending on the OS/application

I average 80MB/s between my nas ( a PC with VMs one of the VM is the NAS) and my station ( a pc ) with sftp (using sshfs) [giants not enabled] and the device in between is a microtik 2011 (going tru switch chip only)

Remember that MTU is negotiated between two points and that along a path you may have several MTU at different capacity and that the MTU will be the lowest available.

edit : SMB is not very efficient for file transfers .

Pere Noel

Posted 2017-01-23T16:16:38.860

Reputation: 11