7

I'm working with a pair of co-located CentOS Linux servers sitting behind a Sonicwall PRO 2040 Enhanced firewall running in transparent bridge mode.

These servers are having a strange problem downloading files more than a few megabytes in size. For example, if I try to wget or FTP a copy of the Linux kernel from kernel.org, the first ~1-2MB will download at 600+K/s, and then throughput will drop off a cliff to 1K/s.

I've reviewed all the firewall configuration settings for anything suspicious, but found nothing. More interestingly, I performed the same download with a Windows server sitting behind the same firewall, and it sailed right through at 600+K/s the whole way.

Has anyone seen this? Where should I start looking to troubleshoot this problem?

Joshua Penix
  • 438
  • 4
  • 7

6 Answers6

4

We too are experiencing the same problem. Anything larger than what can be transferred in the initial download burst (~3.7mb for us), trickles off to ~1-4kb a second regardless of the bandwidth available.

It seems to be a problem specific to and common with the SonicWall PRO 2040 Firewall - https://discussions.apple.com/message/12250946?messageID=12250946

The root of the problem is the firewall and the best long-term fix is to find a setting on the firewall to allow the TCP Window Scaling option to be turned on and also use the initiating machine's TCP Window Scale Factor correctly in the initialization of the connection.

Though this article refers to routers, the same logic applies to what's happening with the SonicWall Pro 2040 Firewall, http://lwn.net/Articles/92727/:

The details are still being figured out, but it would appear that some routers on the net are rewriting the window scale TCP option on SYN packets as they pass through. In particular, they seem to be setting the scale factor to zero, but leaving the option in place. The receiving side sees the option, and responds with a window scale factor of its own. At this point, the initiating system believes that its scale factor has been accepted, and scales its windows accordingly. The other end, however, believes that the scale factor is zero. The result is a misunderstanding over the real size of the receive window, with the system behind the firewall believing it to be much smaller than it really is. If the expected scale factor (and thus the discrepancy) is large, the result is, at best, very slow communication. In many cases, the small window can cause no packets to be transmitted at all, breaking TCP between the two affected systems entirely.

Similar to what was mentioned above, there are workarounds for individual machines - http://prowiki.isc.upenn.edu/wiki/TCP_tuning_for_broken_firewalls, by turning off the rfc1323 TCP extension, the firewall is never given the opportunity to pass a TCP Window Scale Factor of 0 and instead passes along that the rfc1323 extension is not enabled, presumably using the maximum allowed window size by TCP without the rfc1323 extension, which is 64kb.

Commands we've used on our various machines as a temporary workaround:

Ubuntu 10.10:
Change takes effect immediately:

sudo sysctl -w net.ipv4.tcp_window_scaling=0

Permanent change, after next reboot:

sudo sh -c 'echo "net.ipv4.tcp_window_scaling=0" >> /etc/sysctl.conf'


Mac OSx:
Change takes effect immediately:

sudo sysctl -w net.inet.tcp.rfc1323=0 

Permanent change, after next reboot:

sudo sh -c 'echo "net.inet.tcp.rfc1323=0" >> /etc/sysctl.conf'


Win7:
See available options:

netsh interface tcp show global

Disable Command (Persistent):

netsh interface tcp set global autotuning=disabled


In response to why the Windows Server was not having any problems, I found this article - http://msdn.microsoft.com/en-us/library/ms819736.aspx

TCP window scaling is negotiated on demand in Windows Server 2003, based on the value set for the SO_RCVBUF Windows Sockets option when a connection is initiated. Additionally, the Window Scale option is used by default on a connection if the received SYN segment for that connection as initiated by a TCP peer contains the Window Scale option. Windows Server 2003 TCP does not initiate connections with window scaling by default. To instruct the Windows Server 2003 TCP stack to attempt to negotiate a larger receive window size by making use of the Window Scale option, set the Tcp1323Opts registry value to 1.

miu
  • 103
  • 5
Freddy
  • 156
  • 4
3

Those firewalls will bog down if you have Intrusion Prevention and/or Antivirus turned on. Especially if you have TCP Stream selected as one of the types to scan. It will try to build the whole file in its memory to scan it...
Temporarily disable those features and see if your performance climbs back up. If so, then look at adding your servers to the exception list so you don't have drop your pants for the whole network.

Scott Lundberg
  • 2,364
  • 2
  • 14
  • 22
  • Why would performance be different between OSes? – Warner Feb 27 '10 at 02:27
  • @Warner: Not sure what you mean by your question? The firewall issues don't have anything to do with the OS. It's the firewall itself that has a lack of horsepower to keep up in my experience. – Scott Lundberg Feb 27 '10 at 06:07
  • doh! missed that part about windows. To answer your question: I don't know, doesn't make sense to me. I do know that when we used 2040s, we had to disable some of the scanning engines or we would have similar problems. – Scott Lundberg Feb 28 '10 at 01:02
  • Thanks for the suggestion. Just to be sure, I disabled the IPS, Anti-Virus and Anti-Spyware features of the Sonicwall entirely, and the problem still occurred. – Joshua Penix Mar 03 '10 at 06:03
1

Do you see the problems downloading to the Linux server from within the Network? If not that it must be something to do with combination of Linux and the Firewall. On the firewall, can you watch CPU usage or look for warnings? What about resetting the firewall?

Maybe after the first MB or so an adjustment is made by Linux automatically to the TCP options (or maybe Layer 2), and the firewall doesn't like this? Looking at the various network options in /proc might give you an idea. Also, a packet dump on Linux might show some change in what is going on when the slowdown happens.

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
  • I think this is the direction I need to investigate further. I don't see any issues with traffic to/from the Linux server inside the network, only when traffic has to go outside. I've not seen anything at all strange in the firewall's logs or utilization graphs, but I'm going to start investigating the TCP stack tuning. Any suggestion on what I would want to be looking for in a packet dump? – Joshua Penix Mar 03 '10 at 06:11
1

Though I haven't found the root cause of this, I did find a quick workaround that lets me get file transfers through:

sysctl -w net.ipv4.tcp_window_scaling=0

The kernel default for TCP window scaling is on, but that command lets me temporarily disable it. I haven't persisted the setting permanently via sysctl.conf because I'm not sure about its overall performance effects, but it works in a pinch and then I can flip it back to 1 when I'm done.

Joshua Penix
  • 438
  • 4
  • 7
1

Try changing theTCP windows on the Sonicwall.

  1. Login to the SonicWALL admin page
  2. Change ending of the URL from main.html to diag.html
  3. Click Internal Settings, go down to Security Services Settings
  4. Tick "Enable enforcement of a limit on maximum allowed advertised TCP window with any DPI-based service"
  5. Remember to scroll back up the page and press APPLY!
0

There's a lot of initial diagnostics left to perform here.

Errors in /var/log/messages?

Errors in dmesg?

Packet loss evidenced in /sbin/ifconfig?

Issues with link negotiation?

Are there any differences, physical or not, between the Windows box and Linux box?

Edit 1

Can you reproduce the performance using different protocols and sites?

Warner
  • 23,440
  • 2
  • 57
  • 69
  • No errors in logs or dmesg, and ifconfig shows all counters as clean. ethtool shows a proper full duplex gigabit connection. Both the Linux machines as well as the Windows box are roughly the same-generation HP hardware, though I don't have specifics right now. – Joshua Penix Feb 26 '10 at 21:48
  • I've reproduced the problem via both FTP and HTTP and from quite an assortment of sites and large files. – Joshua Penix Mar 03 '10 at 06:04