6

For this application I am less concerned with high availability than I am with total throughput. I have one IP address on the server end, and I want to be able to send more than 1-gigabit of traffic out from the server. The server has two 1-gigabit cards and is connected to a pair of switches. The application involves thousands of remote clients around the world connecting to the server (i.e. not a local network).

Currently, bonding is set up using mode 5 (balance-tlb), but the result is that the throughput for each port won't go above 500Mbit/s. How can I get past this limit? Please assume that I have no access to the switches, so I cannot implement 802.3ad.

(I was hoping to add the "bonding" tag, but I cannot add new tags, so "teaming" it is.)

Antoine Benkemoun
  • 7,314
  • 3
  • 41
  • 60

4 Answers4

6

It is unlikely you will achieve 2 gigabit without cooperation at the switch level and even then it might be hard with only a single IP source/destination combination. Most teams are set up for IP hashing which allocates a single NIC path to each source/destination. As such you'll only get 1 gigabit. There are round-robin schemes but you can often find out of order packet arrival that make it undesirable unless both the host and destination support that scheme.

Kevin Kuphal
  • 9,064
  • 1
  • 34
  • 41
  • I should have mentioned this: there's no requirement that any one TCP connection exceed 1Gbit/s, just that the aggregate go over 1Gbit/s. Interesting note about the IP hashing. –  Jun 16 '09 at 20:28
5

You will need Port Aggregation at the switch ports (the two ports of the access switch that are wired to the 2 gigabit ports on your machine need to be aggregated). But, once that is achieved, you should be getting close to a 2Gbps path (limited by the machine's capabilities).

With port aggregation on the switch matching the logical 2Gbps port of the bonding driver you would be using a multiplexed redundant path with just one IP address on the machine.

There are some interesting notes i came across looking this up now, here .

There is a dark side to this wonderful feature of the Linux bonding driver–it only works with network interfaces that allow the MAC address to be changed when the interface is open. The balance-alb mode depends on swift ARP trickery to fool the kernel into thinking the two physical interfaces are one by rewriting the MAC address on the fly. So the driver for the interface must support this, and many of them don't.

But that's not all the bonding driver can do. The mode option gives you seven choices, and you don't have to worry about interface compatibility. However you do need to consider what your switches support. The balance-rr, balance-xor and broadcast modes need switch ports grouped together. This goes by all sorts of different names, so look for "trunk grouping", "etherchannel", "port aggregation", or some such. 802.3ad requires 802.3ad support in the switch.

nik
  • 7,040
  • 2
  • 24
  • 30
  • Yeah, I'm afraid we will have to go the 802.3ad route. I'm not opposed to it, but it's one of those things that hasn't been deployed here before, so the time it takes to get it all tested and out the door is increased substantially, whereas if it's just on the host level whatever driver can be enabled and then disabled if it doesn't end up working. –  Jun 16 '09 at 20:29
3

First, you probably know that you're never actually going to hit 2Gb/s. The overhead of TCP/IP will limit you to probably 90% of the max.

2nd, even if you use a TCP offload engine, the stack above layer 3 definitely affects where the bottleneck is. In other words, how are you transmitting the data? I could have 10Gb/s NICs and a crossover between them and I'd not get above a few hundred Mb/s if I was using rsync over an ssh tunnel.

What else can you tell us about the topology? You said that the server is connected to a couple of switches, and that the remote clients are all over the world. Do you have > 500Mb/s (aggregate) of WAN connections?

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
  • There are a couple handfuls of gigabit WAN connections. Some might be 10GE, not sure. The server easily does 1Gb/s with ~5% idle CPU, I expect it should be able to exceed that. –  Jun 16 '09 at 20:26
  • What changed between the server getting 1Gb/s and only getting 500Mb/s? – Matt Simmons Jun 16 '09 at 21:08
  • The server in its original configuration pushes 1Gb/s out of a single port. With the bonding driver set up (mode 5) it pushes 500Mb/s max out of each port. The hope was that it would potentially max out both nics (or at least go above 500). –  Jun 16 '09 at 21:13
  • Hrm...that's interesting. Have you tried other bonding modes? Mode=0 for example? – Matt Simmons Jun 16 '09 at 21:30
  • Or mode 6, for that matter? – Matt Simmons Jun 16 '09 at 21:31
  • Haven't, although I'm wishing we had. Maybe now that the fire is out, we can try something in more of a lab setting. I'm going to "answer" my question but it's not going to truly be the answer to the question. –  Jun 18 '09 at 16:17
1

We didn't truly resolve this issue. What we did is set up two servers, one bound to an IP on each interface, and then followed the directions here to force traffic to go out the port it came in:

http://kindlund.wordpress.com/2007/11/19/configuring-multiple-default-routes-in-linux/

Slightly modified for our situation. In this example, the gateway is 192.168.0.1 and the server's IPs are 192.168.0.211 and 192.168.0.212 on eth0 and eth1 respectively:

printf "1\tuplink0\n" >> /etc/iproute2/rt_tables
printf "2\tuplink1\n" >> /etc/iproute2/rt_tables

ip route add 192.168.0.211/32 dev eth0 src 192.168.0.211 table uplink0
ip route add default via 192.168.0.1 dev eth0 table uplink0
ip rule add from 192.168.0.211/32 table uplink0
ip rule add to 192.168.0.211/32 table uplink0

ip route add 192.168.0.212/32 dev eth1 src 192.168.0.212 table uplink1
ip route add default via 192.168.0.1 dev eth1 table uplink1
ip rule add from 192.168.0.212/32 table uplink1
ip rule add to 192.168.0.212/32 table uplink1