10

Linux Kernel before 3.6 used route caching to do IPv4 multipath routing, which meant routing between two separate lines/ISPs was quite easy. From 3.6 the algorithm changed to being per-packet, meaning that some route table/rule/iptables marker tricks were required to achieve the two lines/ISPs.

However, if you had two lines with the same ISP who could route a single IP down both lines on a per-packet basis in a balanced/failover fashion, then from 3.6 you could easily achieve line bonding (at the IP level) because of the per-packet routing in both directions.

From 4.4, the kernel changed again to flow-based load balancing based on a hash over the source and destination addresses.

I am currently running Kernel 4.4.36, and am using multipath routing over PPPoE connections. My downstream traffic from the ISP is routed across the two separate lines on a per-packet basis (one IP routed down both lines). This gives me a download speed faster than the speed of one individual line. Nearly the speed of both lines added together. It works really well, Skype video, VoIP (UDP), YouTube etc. all work great.

Because of having such a good downstream experience I want to try it upstream but my upstream traffic is routed according to the newer flow-based algorithm across both ppp devices (which have the same IP address). This means that I cannot achieve an upload speed that is faster than the speed of a single line.

Is there a way to configure the current Kernel to use the per-packet algorithm? Or some other method to achieve per-packet multipath routing? Would I need to revert to an older Kernel (which I don't want to do for various other reasons)?

My ISP does not support multi-link ppp.

In case it is relevant, I am currently running Arch Linux ARMv7 on a Raspberry Pi 3.

bao7uo
  • 1,664
  • 11
  • 24
  • 3
    This is a very bad idea. Per-packet balancing at L2 (i.e. MLPPP) includes enough logic to reassemble packets in-order. Running this over IP leaves open a tremendous opportunity (if not a near-certainty) for out-of-order delivery. This is going to cause a huge number of problems with slow TCP sessions, altogether broken UDP, issues with any kind of real-time streaming, etc. The other issue is that even if you're sending packets round-robin to your ISP there's absolutely no suggestion that they will be similarly balancing toward you. – rnxrx Dec 12 '16 at 21:02
  • @rnxrx thanks for your comment - have edited the question to provide extra details. From my question: "downstream traffic from the ISP is routed across the two separate lines on a per-packet basis". The ISP provide a control panel - when I choose one IP to be routed over both lines then they route it perfectly balanced round-robin on a per-packet basis. Works well, at about 90% of the total of the speeds of both lines added together, and provides instant failover. Skype video, VOIP calls, YouTube, BBC streaming etc. All great - Such a great downstream experience makes me want to try upstream – bao7uo Dec 12 '16 at 22:06
  • 1
    Ahh - got it... So at present do you have two unique IP's (one per connection) or are they routing to a single IP (or a subnet) on your side via two parallel paths? If you're running any kind of NAT it's obviously crucial that it occur before this balancing. Anyhow - have you had a look at http://support.aa.net.uk/Router_-_Linux_upload_bonding_using_policy_routing ? It's using iptables extensions to accomplish basically what you're describing and as such should be fairly consistent across reasonably modern kernel versions. – rnxrx Dec 13 '16 at 04:30
  • Thanks @rnxrx - yes I can do either option (two unique IPs or a single IP via the parallel paths). I have preferred the single IP option as it seemed to make more sense. – bao7uo Dec 13 '16 at 11:53

1 Answers1

4

Ok, so after having had more time to investigate this I found a way to do it using Linux TEQL (True Link Equalizer). Here is a link I loosely followed, but with some tweaks.

http://lartc.org/howto/lartc.loadshare.html

This is how I got it working on Arch Linux ARMv7 (Raspberry Pi 3)

On boot:

The following command should be run on boot to load the appropriate Kernel module.

modprobe sch_teql

The following commands also to run on boot assuming you want to NAT from a local network on eth0.

sysctl -w net.ipv4.ip_forward=1
iptables -A INPUT -i ppp+ -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -i ppp+ -o eth0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A POSTROUTING -t nat -o teql+ -j MASQUERADE

The FORWARD return traffic is on ppp+, and the POSTROUTING MASQUERADE on teql+ because the outgoing traffic goes out on teql and the return traffic comes back on ppp.

When ppp links come up:

Assuming the links to be load-balanced are ppp, the following commands to be run in a script in an /etc/ppp/ip-up.d/ script.

sysctl -w net.ipv4.conf.ppp1.rp_filter=2
sysctl -w net.ipv4.conf.ppp2.rp_filter=2
tc qdisc add dev ppp1 root teql0
tc qdisc add dev ppp2 root teql0
ip address add 1.1.1.1/32 dev teql0
# you can add additional public IP addresses teql0 if you need to
ip link set teql0 up
ip route replace default scope global dev teql0

Where 1.1.1.1 is your ISP-facing public IP address. Additional public IPs can be assigned to the teql0 device, but don't need to be assigned to the ppp devices. In my setup the two ppp links share the same IP (negotiated by pppoe etc.) The teql link it manually assigned as shown above. The ISP needs to send traffic for the IP equally down both links.

The reverse path (rp_filter) is set to 2 (loose) both in the script above so that the return packets are not dropped due to them coming back on the ppp interfaces rather than teql0.

I have set it up that way, and it works perfectly. Very easy! When the links fail, there is seamless failover. When they come up, they just start working again. Seems like there is no packet loss or delay when it fails over, and none when it comes back up either.

Also, one of the commenters suggested the below link which uses policy routing, with iptables to mark every other packet etc. but I will try in a few days to see whether it works any better than the above and provide feedback here accordingly.

http://support.aa.net.uk/Router_-_Linux_upload_bonding_using_policy_routing

bao7uo
  • 1,664
  • 11
  • 24
  • I didn't ever try the policy routing because the TEQL worked so well. If it ain't broke.... – bao7uo Nov 11 '17 at 10:58
  • I'm trying to get this to work. I have the bonding working, I can use the bonded interface from the router. I can't get the NAT working though, traffic from my LAN is not going down the bonded link :( – andynormancx Feb 26 '18 at 18:27
  • if you post a new question on server fault, and link to it from a comment i will try to figure it out for you. include as much info as possible like interface/ip configuration, routing table, iptables rules, etc. unless you can fit all the info here in the comments? – bao7uo Feb 27 '18 at 19:51
  • 1
    PS. just noticed a mistake in my config. it said `sysctl -w net.ipv4.ip_forward` but should say `sysctl -w net.ipv4.ip_forward=1` so I have corrected above. That would certainly prevent the traffic from the LAN going down the bonded link. – bao7uo Feb 27 '18 at 19:54
  • I don't think that was what stopped it working for me, I had forwarding enabled in sysctl. I'm now trying to work out whether the massive number of out of order packets that I'm seeing is expected. – andynormancx Feb 28 '18 at 12:54
  • ah great, you fixed it? – bao7uo Mar 01 '18 at 13:40
  • I did, I assume it was because I have some default drop rules at the end of my firewall script. For example I have 'iptables -P FORWARD DROP', so I needed an explicit forward for the lan->teql. I am getting loads of TCP retransmissions and out-of-order though, I'm going to give the policy based routing a go instead to see if that works better. – andynormancx Mar 02 '18 at 15:57
  • please share the results, that would be interesting to hear how you get on :-) – bao7uo Mar 10 '18 at 20:32