11

We are running a OpenVPN VPN over a BGAN satellite link where ping times are about 3 seconds. We use it in a tun configuration, and we're running on Linux (CentOS). It is primarily email that will be sent over the link, but as soon as the mail contains large attachments, the VPN seems to stall.

The "I can ping through the tunnel, but any real work causes it to lock up. Is this an MTU problem?" question in the OpenVPN FAQ seems to describe my problem exactly, but using mssfix and fragment still does not seem to do much to improve the situation.

My main test is to copy a 2MB file over the VPN with scp. It will copy about 192kbytes, and then report a - stalled - state. If I wait a couple of seconds it will start copying again, and then stall again after another couple of kbytes.

This stalling occurs whether or not I've set the fragment or mssfix options in my OpenVPN configuration (although setting fragment 1000 did seem to reduce the stalling, but not eliminate it). The OpenVPN mtu-test reported 1542 as the MTU size.

I've searched the internet for more advice on how and when to use mssfix and fragment, but I only find pages saying the same as the FAQ, and not giving details as to how and when to use which parameters.

My questions then is:

  • When do I use mssfix and fragment?
  • Do I use mssfix and fragment in combination?
  • If mssfix and fragment are the solution, what are the tun-mtu, link-mtu and mtu-disc parameters for?

Furthermore, I've been using the tool iperf to measure the bandwidth. Without the VPN, it constantly measures in the order of 210Kbits/sec.

When using iperf over the VPN ($ iperf -c remoteserver -t60 -i5), it would start at 10Kbits/sec, then steadily go up until it reports 1.2Mbits/sec, and then it will seem to stall, where it reports 0kbits/sec for a number of iterations (I think the 1.2Mbits/sec may be because of some OpenVPN buffering or so on)

Is iperf the best way to measure the bandwidth?

Any help with this situation will be greatly appreciated.

iWerner
  • 211
  • 1
  • 2
  • 5
  • Is the openvpn using TCP or UDP at the moment? – pjc50 Nov 25 '09 at 14:26
  • It is currently using UDP – iWerner Nov 25 '09 at 14:36
  • Thank you for all the answers, but I've had to temporarily stop because the BGAN unit ran out of airtime. I'm hoping to cointinue later today. I should mention that we'd prefer staying with UDP, as using TCP would double the data sent over the network (and hence the cost, for which our client is already very sensitive) – iWerner Nov 26 '09 at 07:16

4 Answers4

5

1542 as an MTU? Never heard of that for a WAN link. Usually, MTU is the max payload, ip packet size minus the header for IP (20 bytes) and ICMP (8 bytes). That means MTU=1500 for a traditional Ethernet LAN. Furthermore, most VPN's introduce an overhead for their packet encapsulation. A typical VPN MTU is 1400.

In modern networks, it is difficult to conclude what MTU will be at any moment, as ingress and egress paths may be different, and they may also change due to automatic path re-routing. For a network like this, it may be more effective to set the MTU low on your hosts that are on either side of the VPN link, such as 576.

MSS (maximum segment size) is MTU minus the IP+TCP headers (40 bytes). This is typically negotiated by the network stack, and usually does not have the same negotiation issues as MTU, unless MTU is wrong. (MTU negotiation is usually impaired by blocked ICMP or black hole routers).

The first thing I would do is do a network packet capture on your sending end, and sort the display by frame size (you may need to add this column in Wireshark). You should verify that you aren't sending any frames that are oversize, what you would expect them to be. It's not unusual for modern network cards to send oversize frames if options such as Large Send Offload or Jumbo Frames are enabled. I've seen 30,000+ byte frames when these options are enabled.

Greg Askew
  • 34,339
  • 3
  • 52
  • 81
  • +1 for packet capture before changing anything. even if you don't find any huge frame, you might see 'normal' packets fragmented somewhere. – Javier Nov 25 '09 at 17:58
  • 1
    By default OpenVPN sets the MTU of the tun device to 1500 (which is the same as the MTU on the ethernet devices on our machines). I'm still not sure whether fragmentation of the VPN packets is a good thing or a bad thing. The answers in this thread seems to imply that it is bad, while the other references I found on the web implies that it is good. – iWerner Nov 26 '09 at 07:26
  • 2
    @iwerner: have you tried to determine the mtu size with ping? If ICMP is not disabled somwhere you can use the following on windows: ping -f -l 1372 . Keep reducing the number until it succeeds. On linux, ping -s 1372 -M do . FYI, the OpenVPN FAQ recommends using mssfix 1200, but that does not address the root cause. Using VPN solutions to fragment always has the potential for a performance hit. If you have a large VPN setup, you would not be able to use fragmentation on the central concentrator end, only the remote office end. – Greg Askew Nov 26 '09 at 18:51
2

Just out of curiosity, have you tried lowering the MTU of the network interface? Perhaps the satellite link screws up fragmentation badly. As a counter-intuitive note, you might want to try openvpn over TCP for a change. I know it should decrease performances, but if you have no control over fragmentation along the line it might assist you.

lorenzog
  • 2,719
  • 1
  • 18
  • 24
  • I was going to suggest the opposite :) - this high latency case is the one where the TCP-in-TCP problems show up and UDP will avoid that. – pjc50 Nov 25 '09 at 14:26
  • I was assuming he was using the default UDP port for openvpn, and thus suggested the opposite.. yes, normally I'd agree with you. But hey, we all know that sysadmin is trial-and-error, and usually 'try-doing-the-opposite-see-if-it-works' :) – lorenzog Nov 25 '09 at 14:36
  • Thanks. We're using UDP at the moment, and trying TCP never occured to me. (If I had more rep I would've upvoted you) – iWerner Nov 25 '09 at 14:37
  • @iWerner: thanks :) also, try reducing MTU progressively on the iface, and see when it stops (if it does). – lorenzog Nov 25 '09 at 15:28
2

When you use TCP, increase the window size of TCP; this will help with the "number of packets in the air".

It's been a while since I've had to play with this stuff, but here is one link google found for me.

After I re-read your question I see you're running BGAN - I'd have a good look at this (or just google for: "BGAN spoofing").

As for bandwidth measurement, I've found iperf to be pretty decent so long as you're using reasonable packet sizes.

Eddy
  • 842
  • 5
  • 8
  • This is interesting - It mentions that the TCP accelerator is available for Redhat, while the inmarsat people we spoke to said that it was only available for Windows and OS X (and this is indeed what the Inmarsat/BGAN website says) – iWerner Nov 26 '09 at 07:09
  • They may not have it now; I see the document date is 07. If you push hard enough and talk to enough people; you might find someone with an old copy of it. Generally when you phone in you get tier one support. I'll try some of my contacts but no guarantees. – Eddy Nov 26 '09 at 14:30
  • I got the run around from our satellite provider; hard to find someone that knows what they're talking about. I'll keep trying, in the mean time here is something to try: http://sourceforge.net/projects/pepsal/ From the project description it is doing pretty much what the Inmarsat software is doing: PEPsal is an integrated, multi-layer, transparent TCP Performance Enhancing Proxy which splits the connection into two parts, making use of Linux TCP enhancements when sending data, and largely improving performance in links with different characteristics – Eddy Nov 26 '09 at 18:43
2

I think you might be barking at the wrong tree. Any time I've had wrong MTU issues, traffic stopped way before 192KB. I think it's more related to some in "in flight packets" window, either TCP window, or maybe some buffers in the satellite uplink itself.

Definitely do some long packet captures (both 'inside' and 'outside' of the VPN) and see if you're getting all the ACK's

Javier
  • 9,078
  • 2
  • 23
  • 24