34

Facts (please identify any false statements):

  1. I have a 100 Mbps connection between two sites that are 80 ms apart

  2. This is a long fat connection that could benefit from a large TCP window size perhaps up to 100 Mbps * 0.08 sec = 1,000,000 bytes

  3. Both machines are running Windows Server 2012. "Receive window auto tuning level" is normal on both. "Window scaling heuristics" are disabled on both.

  4. I ran "iperf -s" on one side and "iperf -c" on the other. The transfer happened at 5 Mbps. I get the same result going the other direction.

  5. Both sides advertised support for TCP sliding windows in their SYNs.

  6. The receiver requested a TCP window size of 64,512 bytes (0xFC00) during the entire run with a TCP window scale value of "no shift" (0x000).

  7. The network was able to handle a larger window size (see sequence diagrams below)

  8. The receiver kept the window smaller than the network supports

  9. This connection is happening within an IPSEC VPN. MTU of the tunnel interface is reduced to 1400 bytes in both directions.

Question

  • Why is the receiver keeping the window small?

Non-Answers

  • The network is broken

    Linux machines running on the same network open the TCP window to 1.5 megabytes and transmit data at 6 times the bandwidth

  • Window scaling heuristics are enabled

    Window scaling heuristics are disabled (see output of "netsh interface tcp show heuristics" below)

  • Receive Window Auto-Tuning Level is not normal

    Receive Window Auto-Tuning Level is normal (see output of "netsh interface tcp show global" below)

  • This just doesn't work well on a virtual machine within ESXi

    I get 6 times better performance on a virtual linux machine running on the same host.


Update 1 June 12, 2015 4:30 pm PDT

I modified the test by putting linux on one side of the connection. Sure enough, when linux sends data to Windows Server 2012, Windows offers a too-small TCP receive window (64,512 bytes).

When I send data from Windows to linux, linux offers a large-enough TCP receive window (1,365,120 bytes). However, Windows restricts sends to max ~60,000 bytes in flight.


Update 2 June 13, 2015 3:00 pm PDT

A step closer to root cause. In my setup, neither SO_SNDBUF nor SO_RCVBUF are set (by iperf). These are the send and receive buffers which effectively bound the receive window. When not specifying these values, Windows Server 2012 provide a default value of 64 kB. So the question is now:

Question

  • When one is not specified, why isn't Windows Server 2012 dynamically increasing SO_SNDBUF/SO_RCVBUF to accommodate long fat pipes as described at MSDN?

Non-answers

  • "netsh winsock show autotuning" is disabled

    It is enabled.


Update 3 August 24, 2015 4:00 pm PDT

netsh apparently has been replaced with Set-NetTCPSetting and family. Get-NetTCPSetting combined with Get-NetTCPConnection shows I am operating in the 'Internet' regime which offers me these settings:

SettingName                   : Internet
MinRto(ms)                    : 300
InitialCongestionWindow(MSS)  : 4
CongestionProvider            : CTCP
CwndRestart                   : False
DelayedAckTimeout(ms)         : 50
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

Sender TCP Settings

PS C:\Users\acs> netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State          : enabled
Chimney Offload State               : disabled
NetDMA State                        : disabled
Direct Cache Access (DCA)           : disabled
Receive Window Auto-Tuning Level    : normal
Add-On Congestion Control Provider  : none
ECN Capability                      : enabled
RFC 1323 Timestamps                 : disabled
Initial RTO                         : 3000
Receive Segment Coalescing State    : enabled

PS C:\Users\acs> netsh interface tcp show heuristics
TCP Window Scaling heuristics Parameters
----------------------------------------------
Window Scaling heuristics         : disabled
Qualifying Destination Threshold  : 3
Profile type unknown              : normal
Profile type public               : normal
Profile type private              : normal
Profile type domain               : normal

PS C:\Users\acs> Get-NetTCPSetting

SettingName                   : Automatic
MinRto(ms)                    : 
InitialCongestionWindow(MSS)  : 
CongestionProvider            : 
CwndRestart                   : 
DelayedAckTimeout(ms)         : 
MemoryPressureProtection      : 
AutoTuningLevelLocal          : 
AutoTuningLevelGroupPolicy    : 
AutoTuningLevelEffective      : 
EcnCapability                 : 
Timestamps                    : 
InitialRto(ms)                : 
ScalingHeuristics             : 
DynamicPortRangeStartPort     : 
DynamicPortRangeNumberOfPorts : 

SettingName                   : Custom
MinRto(ms)                    : 20
InitialCongestionWindow(MSS)  : 4
CongestionProvider            : DCTCP
CwndRestart                   : True
DelayedAckTimeout(ms)         : 10
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

SettingName                   : Compat
MinRto(ms)                    : 300
InitialCongestionWindow(MSS)  : 2
CongestionProvider            : Default
CwndRestart                   : False
DelayedAckTimeout(ms)         : 200
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

SettingName                   : Datacenter
MinRto(ms)                    : 20
InitialCongestionWindow(MSS)  : 4
CongestionProvider            : DCTCP
CwndRestart                   : True
DelayedAckTimeout(ms)         : 10
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

SettingName                   : Internet
MinRto(ms)                    : 300
InitialCongestionWindow(MSS)  : 4
CongestionProvider            : CTCP
CwndRestart                   : False
DelayedAckTimeout(ms)         : 50
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

Sender SYN

No.     Time           Source                Destination           Protocol Length Delta      Sequence number Acknowledgment number Bytes in flight Calculated window size Info
    814 5.036577000    10.10.0.21            10.11.0.1             TCP      66     0.000000000 0               0                                     64512                  49758→5001 [SYN, ECN, CWR] Seq=0 Win=64512 Len=0 MSS=1460 WS=1 SACK_PERM=1

Frame 814: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) on interface 0
Ethernet II, Src: 00:11:22:33:44:55, Dst: aa:bb:cc:dd:ee:ff
Internet Protocol Version 4, Src: 10.10.0.21 (10.10.0.21), Dst: 10.11.0.1 (10.11.0.1)
Transmission Control Protocol, Src Port: 49758 (49758), Dst Port: 5001 (5001), Seq: 0, Len: 0
    Source Port: 49758 (49758)
    Destination Port: 5001 (5001)
    [Stream index: 73]
    [TCP Segment Len: 0]
    Sequence number: 0    (relative sequence number)
    Acknowledgment number: 0
    Header Length: 32 bytes
    .... 0000 1100 0010 = Flags: 0x0c2 (SYN, ECN, CWR)
    Window size value: 64512
    [Calculated window size: 64512]
    Checksum: 0x1451 [validation disabled]
    Urgent pointer: 0
    Options: (12 bytes), Maximum segment size, No-Operation (NOP), Window scale, No-Operation (NOP), No-Operation (NOP), SACK permitted
        Maximum segment size: 1460 bytes
        No-Operation (NOP)
        Window scale: 0 (multiply by 1)
            Kind: Window Scale (3)
            Length: 3
            Shift count: 0
            [Multiplier: 1]
        No-Operation (NOP)
        No-Operation (NOP)
        TCP SACK Permitted Option: True

Sender perspective of sequence graph enter image description here

enter image description here

Receiver TCP Settings

PS C:\Users\acs> netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State          : enabled
Chimney Offload State               : disabled
NetDMA State                        : disabled
Direct Cache Access (DCA)           : disabled
Receive Window Auto-Tuning Level    : normal
Add-On Congestion Control Provider  : none
ECN Capability                      : enabled
RFC 1323 Timestamps                 : disabled
Initial RTO                         : 3000
Receive Segment Coalescing State    : enabled

PS C:\Users\acs> netsh interface tcp show heuristics
TCP Window Scaling heuristics Parameters
----------------------------------------------
Window Scaling heuristics         : disabled
Qualifying Destination Threshold  : 3
Profile type unknown              : normal
Profile type public               : normal
Profile type private              : normal
Profile type domain               : normal

PS C:\Users\acs> Get-NetTCPSetting

SettingName                   : Automatic
MinRto(ms)                    : 
InitialCongestionWindow(MSS)  : 
CongestionProvider            : 
CwndRestart                   : 
DelayedAckTimeout(ms)         : 
MemoryPressureProtection      : 
AutoTuningLevelLocal          : 
AutoTuningLevelGroupPolicy    : 
AutoTuningLevelEffective      : 
EcnCapability                 : 
Timestamps                    : 
InitialRto(ms)                : 
ScalingHeuristics             : 
DynamicPortRangeStartPort     : 
DynamicPortRangeNumberOfPorts : 

SettingName                   : Custom
MinRto(ms)                    : 20
InitialCongestionWindow(MSS)  : 4
CongestionProvider            : DCTCP
CwndRestart                   : True
DelayedAckTimeout(ms)         : 10
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

SettingName                   : Compat
MinRto(ms)                    : 300
InitialCongestionWindow(MSS)  : 2
CongestionProvider            : Default
CwndRestart                   : False
DelayedAckTimeout(ms)         : 200
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

SettingName                   : Datacenter
MinRto(ms)                    : 20
InitialCongestionWindow(MSS)  : 4
CongestionProvider            : DCTCP
CwndRestart                   : True
DelayedAckTimeout(ms)         : 10
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

SettingName                   : Internet
MinRto(ms)                    : 300
InitialCongestionWindow(MSS)  : 4
CongestionProvider            : CTCP
CwndRestart                   : False
DelayedAckTimeout(ms)         : 50
MemoryPressureProtection      : Enabled
AutoTuningLevelLocal          : Normal
AutoTuningLevelGroupPolicy    : NotConfigured
AutoTuningLevelEffective      : Local
EcnCapability                 : Enabled
Timestamps                    : Disabled
InitialRto(ms)                : 3000
ScalingHeuristics             : Disabled
DynamicPortRangeStartPort     : 49152
DynamicPortRangeNumberOfPorts : 16384

Receiver SYN

No.     Time           Source                Destination           Protocol Length Delta      Sequence number Acknowledgment number Bytes in flight Calculated window size Info
    817 5.110501000    10.11.0.1             10.10.0.21            TCP      70     0.073924000 0               1                                     64512                  5001→49758 [SYN, ACK, ECN] Seq=0 Ack=1 Win=64512 Len=0 MSS=1460 WS=1 SACK_PERM=1 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

Frame 817: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface 0
Ethernet II, Src: aa:bb:cc:dd:ee:ff, Dst: 00:11:22:33:44:55
Internet Protocol Version 4, Src: 10.11.0.1 (10.11.0.1), Dst: 10.10.0.21 (10.10.0.21)
Transmission Control Protocol, Src Port: 5001 (5001), Dst Port: 49758 (49758), Seq: 0, Ack: 1, Len: 0
    Source Port: 5001 (5001)
    Destination Port: 49758 (49758)
    [Stream index: 73]
    [TCP Segment Len: 0]
    Sequence number: 0    (relative sequence number)
    Acknowledgment number: 1    (relative ack number)
    Header Length: 32 bytes
    .... 0000 0101 0010 = Flags: 0x052 (SYN, ACK, ECN)
    Window size value: 64512
    [Calculated window size: 64512]
    Checksum: 0xb5bb [validation disabled]
    Urgent pointer: 0
    Options: (12 bytes), Maximum segment size, No-Operation (NOP), Window scale, No-Operation (NOP), No-Operation (NOP), SACK permitted
        Maximum segment size: 1460 bytes
        No-Operation (NOP)
        Window scale: 0 (multiply by 1)
            Kind: Window Scale (3)
            Length: 3
            Shift count: 0
            [Multiplier: 1]
        No-Operation (NOP)
        No-Operation (NOP)
        TCP SACK Permitted Option: True
    [SEQ/ACK analysis]

Receiver perspective of sequence graph enter image description here enter image description here

TCP Window enter image description here

NetVicious
  • 462
  • 5
  • 17
Chris Stankevitz
  • 341
  • 1
  • 3
  • 4
  • 1
    Can you please add the exact configuration - soft AND hardware relevant (network card) for both sides? – TomTom Jun 12 '15 at 07:23
  • 1
    Sounds like window tuning is [restricted](https://www.duckware.com/blog/how-windows-is-killing-internet-download-speeds/index.html). – David Schwartz Jun 12 '15 at 17:37
  • @TomTom Both machines are VMs inside ESXi running on HP Proliant DL380 G5. Virtual ethernet adapters are Intel 82574L. Hardware ethernet adapters are BCM5719. – Chris Stankevitz Jun 12 '15 at 17:40
  • @David Schwartz "receive window auto tuning level" is normal on both and "window scaling heuristics" are disabled (see updated config in OP). I believe this indicates that tuning is not [restricted](https://www.duckware.com/blog/how-windows-is-killing-internet-download-speeds/index.html). – Chris Stankevitz Jun 12 '15 at 17:54
  • On the windows machine, what is the value of HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\ interface-name Parameter TcpWindowSize – GeorgeB Jun 19 '15 at 17:54
  • 2
    I don't think this question would be opinion-based, I think the real problem with it that making a good answer would require a debugging of the systems / networks of the OP, which can be done only by him, and not by us. – peterh Jun 26 '15 at 22:12
  • @GeorgeB: That parameter is not listed. But even if it were [listed as a number], I can't imagine it would be of much use. I'm expecting the OS to dynamically increase the TCP window size. Furthermore, my problem is most likely rooted in the send/receive buffer sizes. The TCP window size is derived from the send/receive buffer sizes. See update 2 in the OP. – Chris Stankevitz Aug 24 '15 at 22:44
  • @peterh: whatever problem I have must be universal. The two machines I am using are fresh out-of-the-box Windows 2012 servers. Of course you'll need a high BDP WAN to test with. – Chris Stankevitz Aug 24 '15 at 22:52
  • Just to make sure: I understand you checked what's on the line by running Wireshark on both machines, and the packet contents are the some on both ends? Asking since IPsec VPNs sometimes make a mess of IP packets. – StanTastic Mar 02 '16 at 13:49
  • Is ipv6 enabled (or disabled with the right sequence of registry edits)? – Jim B Jul 10 '16 at 03:46

4 Answers4

1

I've seen this as a driver-specific issue; in my case with QLogic network controllers that were attempting to use TCPChimney. This link describes the TCPChimney functionality added in Windows 2008 - but I'm pretty sure it still applies: https://support.microsoft.com/en-us/kb/951037

I would recommend testing the following, in order; after each test, reboot and see whether the receiver starts increasing the TCP RWIN as expected.

1) Load the latest versions of the drivers for the network adapter on the receiving computer. 1) Disable TCPChimney on the receiving computer 2) Disable all 'TCP Receive' offloading. This would be found in Advanced settings of the Network Adapter Properties (the same area where Speed & Duplex would be set) 3) Disable all 'TCP Send' offloading (also in the Network Adapter's Advanced properties)

( And contrary to the comment "And big TCP window sizes over 65k are bad for servers, as then the memory demand for connections increases. 65k alone might also not make you happy enough. – user303507 Aug 6 '15 at 11:30", large TCP Receive Windows are NOT inherently bad for the server. In the case of high-bandwidth, high-latency links (like Satellite relays), large RWIN values are necessary so that we have more TCP data "in the pipe". Imagine a 600 Mbps connection with 3000 ms latency; the high-bandwidth link would be limited to about 20 KBps ;as only 65 KB of un-acked TCP data could be "in the pipe" at a time. )

0

Looks like a Windows autotuning bug to me, perhaps something to do with this? https://support.microsoft.com/en-us/kb/932170

Have you tried requesting a larger SO_RCVBUF value manually using WskControlSocket?

  • Technically those buffers don't have a relationship to TCP Window size: http://stackoverflow.com/questions/14381303/increasing-tcp-window-size – Mary Jul 22 '15 at 22:01
  • Phil: I'm running Windows Server 2012 on both sides so that link doesn't apply, but I do suspect a bug of some sort. I can request a larger SO_RCVBUF - and that helps - but that doesn't help me understand what is broken (see "Update 2"). – Chris Stankevitz Aug 11 '15 at 17:26
  • Mary: the buffers are indirectly related to windows size. The network stack will recognize the small buffers and consequently not increase the window size. I describe this using handwaving in "Update 2". – Chris Stankevitz Aug 11 '15 at 17:28
0

Use a network optimizer like Cisco WAAS or Riverbed. They do local acks quick, so you do not need to care about the server settings. In bigger network you have anyway no influence on server setup as these are other teams or this is outsourced.

user303507
  • 11
  • 1
  • And big TCP window sizes over 65k are bad for servers, as then the memory demand for connections increases. 65k alone might also not make you happy enough. – user303507 Aug 06 '15 at 11:30
  • user303507: I want to understand what is happening with the Windows Server 2012 networking stack. I'm not interested in masking the problem with a network appliance. But I agree that buying a network appliance or moving my offices closer together will work around this problem. – Chris Stankevitz Aug 11 '15 at 17:31
  • user303507's comment might be on the right track - I wonder if the memory concern causes windows to limit the window size based on some invisible heuristic or registry setting. Not that that is appropriate behavior, assuming you are correct about the documentation. – Dan Pritts Sep 09 '15 at 19:24
0

Here is some information I discovered that may be the answer you are looking for. Note the mention of 64kb limit on disabled mode may be a clue to similar limits on normal mode that aren't documented.

Try enabling "experimental" mode for astronomical Auto-Tuning levels.

When setting Windows Auto-Tuning level the possible settings are as follows:

  • normal: default value, allows the receive window to grow to accommodate most conditions
  • disabled: uses a fixed value for the tcp receive window. Limits it to 64KB (limited at 65535).
  • highlyrestricted: allows the receive window to grow beyond its default value, very conservatively
  • restricted: somewhat restricted growth of the tcp receive window beyond its default value
  • experimental: allows the receive window to grow to accommodate extreme scenarios (not recommended, it can degrade performance in common scenarios, only intended for research purposes. It enables RWIN values of over 16 MB)
BE77Y
  • 2,577
  • 3
  • 17
  • 23
Aland
  • 36
  • 3