0

On Debian Buster, Kernel 5.4.51, I have two interfaces tap0 and tap1 joined in a bond interface in mode balance-xor to increase throughput. There is, however, some traffic that must be sent through tap0. For the rest I don't care.

Theoretically, the bond driver can do that using tc filters and multiq, as documented in the driver docs. I can see in the statistics that the queues are claimed to be used, but inspecting the traffic on the two interfaces shows that the filter is not respected.

Here's what I did:

I assigned each of the tap interfaces to a queue on the bond, set the queuing discipline to multiqueue and then used tc to override the queuing decision of the bond to force traffic to 192.168.1.100 (as an example) to always take tap0.

# echo "tap0:1" > /sys/class/net/bond0/bonding/queue_id
# echo "tap1:2" > /sys/class/net/bond0/bonding/queue_id

# tc qdisc add dev bond0 handle 1 root multiq

# tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip dst \
    192.168.1.100 action skbedit queue_mapping 1

In the tc stats, you can see that the different queues are actually used:

# tc -s class show dev bond0
class multiq 1:1 parent 1: 
 Sent 377256252 bytes 2526104 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
class multiq 1:2 parent 1: 
 Sent 21031 bytes 2982 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
class multiq 1:3 parent 1: 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

Most traffic takes the generic queue, the special traffic takes the first of the two interface-specific queues. If I delete the tc filter again, then the packet counter on the specific queue 1:2 stops.

(Note that the queue numbers between the bonding driver and tc are offset by 1, so the queue 1:1 means "let the driver decide", queue 1:2 means "always go through tap0", queue 1:3 means "always go through tap1")

The queues are also mapped to the interfaces:

# cat /proc/net/bonding/bond
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (xor)
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: tap0
MII Status: up
Speed: 10 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: xx:xx:xx:xx:xx:89
Slave queue ID: 1

Slave Interface: tap1
MII Status: up
Speed: 10 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: xx:xx:xx:xx:xx:d6
Slave queue ID: 2

If I tcpdump the two taps on the receiving end individually though, I can clearly see that no matter which queue is used, the special traffic actually still takes either interface using the balance-xor rule. Now - where did I miss something?

RenWal
  • 21
  • 4

1 Answers1

0

Okay, digging deeper, there is this remark in the documentation:

This feature first appeared in bonding driver version 3.7.0 and support for output slave selection was limited to round-robin and active-backup modes.

Debian Buster has 3.7.1, and apparently the support is still limited to those two modes, so what I'm trying to do is currently impossible. You can set the mode to active-backup and immediately the queues are respected. This of course defeats the purpose of load balancing. I had hoped that using any other mode would cause some kind of warning, but it doesn't. The driver just happily ignores what you tell it to do.

The only thing you might need to do is allow packet input on the inactive interface, since otherwise the packet that you redirect with tc will be discarded:

# echo 1 > /sys/class/net/bond0/bonding/all_slaves_active

I assume, if one really wanted, they could now use some tc magic to basically re-implement the balance-xor logic there and override the target queue for each packet. One could then leave the mode in active-backup but achieve load balancing. Or just implement this feature in the bonding driver.

RenWal
  • 21
  • 4