5

I have a few linux test boxes on Scaleway, each having 2x NICs that are all connected to the same network 10.0.0.0/8 but each has its own gateway.

I want to be able to use both NICs (eth0/eth1) and their IPs for communication. So if applications bound to IP .187 then dev eth0 should be used. If an application is bound to IP .189 then eth1 should be used.

Right now only interface eth0 with IP .187 is responding to requests. Any requests.(Thats why I uses ping and ssh for testing). However If I change default route from eth0 to eth1(ip .189) then the outgoing traffic is routed through eth1 correctly, in this case eth0 is then not usable.

So how to configure the box, so both interfaces are usable.

Given

Box 1:
eth0_ip = 10.5.68.187/31
eth0_gw = 10.5.68.186

eth1_ip = 10.5.68.189/31
eth1_gw = 10.5.68.188

Approach

Based on my research, here, here I created a bash script that should add static routes with tables so one can use both nics.

#/bin/bash
# My Vars with IP and GW for eth0
eth0_ip=$(ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)
eth0_gw=$(ip route list dev eth0 | awk '{print $1}' | tail -1 | cut -d'/' -f1)

eth1_ip=$(ip -o -4 addr list eth1 | awk '{print $4}' | cut -d/ -f1)
eth1_gw=$(ip route list dev eth1 | awk '{print $1}' | tail -1 | cut -d'/' -f1)

#ip route add 10.0.0.0/8 dev eth0 table 1 priority 100
#ip route add ${eth0_ip} dev eth0 table 1
ip route add default via ${eth0_gw} dev eth0 table 1
ip rule add from ${eth0_ip}/32 table 1

#ip route add 10.0.0.0/8 dev eth1 table 2 priority 110
#ip route add ${eth1_ip} dev eth1 table 2
ip route add default via ${eth1_gw} dev eth1 table 2
ip rule add from ${eth1_ip}/32 table 2

ip route flush cache

I did some variation of the script, but non of them worked

Output

[node]# ip route
default via 10.1.229.186 dev eth0 
10.1.229.186/31 dev eth0 proto kernel scope link src 10.1.229.187 
10.1.229.188/31 dev eth1 proto kernel scope link src 10.1.229.189 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
172.18.0.0/16 dev docker_gwbridge proto kernel scope link src 172.18.0.1 

[node]# ip route show table 1
10.1.229.187 dev eth0 scope link 

[node]# ip route show table 2
10.1.229.189 dev eth1 scope link 

Testing

[]]# ip route get 10.5.68.187 from 10.1.229.187
10.5.68.187 from 10.1.229.187 via 10.1.229.186 dev eth0 
    cache 
[]# ip route get 10.5.68.187 from 10.1.229.189
10.5.68.187 from 10.1.229.189 via 10.1.229.188 dev eth1 
    cache 

From another machine.

ping 10.1.229.187   # OK
ping 10.1.229.189   # NOK

nmap 10.1.229.187 -p 22   # OK
nmap 10.1.229.189 -p 22   # NOK

So how can I setup routing so it works, communicate with .187 and .189 at the same time.

Update 2:

With this setup I was able have some sort of success.

eth0_ip=$(ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)
eth0_gw=$(ip route list dev eth0 | awk '{print $1}' | tail -1 | cut -d'/' -f1)

eth1_ip=$(ip -o -4 addr list eth1 | awk '{print $4}' | cut -d/ -f1)
eth1_gw=$(ip route list dev eth1 | awk '{print $1}' | tail -1 | cut -d'/' -f1)

ip route add default via ${eth0_gw} dev eth0 table 1
ip rule add from ${eth0_ip} table 1

ip route add default via ${eth1_gw} dev eth1 table 2
ip rule add from ${eth1_ip} table 2

After I applied the above script I modified the default route, switch to eth1 andt then back, after that I was able to ping to .187 and .189. (In another example I also removed it completely) Im not sure what the problem here is.

# remove and add route 
ip route change default via ${eth1_gw} dev eth1
ip route change default via ${eth0_gw} dev eth0

ip route flush cache

Update 3:

From various tryouts, it seems to me that table 2 is completely ignored . As the ISP has a custom kernel, is possible to disable routing tables in the kernel? How can I test it?

Update 4:

Once again I had a little progress, but still far away from a working solution. Experimenting with different options, I stumbled across this strange situation. In order to see eth1 working, I need to use the interface in question first once e.g.

I need to ping from IP .189(node1) to another node on the network e.g.: Example: Node 1-> Node 2: ping -I 10.1.229.189 10.5.68.187 this works and then suddenly in return the ping from Node 2 -> Node 1 ping 10.1.229.189 is working. If I don't do the initial connection/ping from (Node 1 -> Node 2) then (Node 2 -> Node 1) isn't working.

The problem here is however, If I restart the machine or wait some time (10-60 Minutes), it goes back to the initial state.

The minimal setup that is partly working is this, (I removed everything subsequently, that didn't make a difference)

eth1_ip=$(ip -o -4 addr list eth1 | awk '{print $4}' | cut -d/ -f1)
eth1_gw=$(ip route list dev eth1 | awk '{print $1}' | tail -1 | cut -d'/' -f1)

ip route add default via ${eth1_gw} dev eth1 table 2
ip rule add from ${eth1_ip} lookup 2

This is the output as requested from @Anton Danilov

[root@cluser-node-1 ~]# ip -4 r ls table all
default via 10.1.229.188 dev eth1 table 2 
default via 10.1.229.186 dev eth0 
10.1.229.186/31 dev eth0 proto kernel scope link src 10.1.229.187 
10.1.229.188/31 dev eth1 proto kernel scope link src 10.1.229.189 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
172.18.0.0/16 dev docker_gwbridge proto kernel scope link src 172.18.0.1 
local 10.1.229.187 dev eth0 table local proto kernel scope host src 10.1.229.187 
broadcast 10.1.229.187 dev eth0 table local proto kernel scope link src 10.1.229.187 
local 10.1.229.189 dev eth1 table local proto kernel scope host src 10.1.229.189 
broadcast 10.1.229.189 dev eth1 table local proto kernel scope link src 10.1.229.189 
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
broadcast 172.17.0.0 dev docker0 table local proto kernel scope link src 172.17.0.1 
local 172.17.0.1 dev docker0 table local proto kernel scope host src 172.17.0.1 
broadcast 172.17.255.255 dev docker0 table local proto kernel scope link src 172.17.0.1 
broadcast 172.18.0.0 dev docker_gwbridge table local proto kernel scope link src 172.18.0.1 
local 172.18.0.1 dev docker_gwbridge table local proto kernel scope host src 172.18.0.1 
broadcast 172.18.255.255 dev docker_gwbridge table local proto kernel scope link src 172.18.0.1 



[root@cluser-node-1 ~]# ip rule list
0:  from all lookup local 
32765:  from 10.1.229.189 lookup 2 
32766:  from all lookup main 
32767:  from all lookup default 

[root@cluser-node-1 ~]# ip n ls dev eth1
10.1.229.188 lladdr 00:07:cb:0b:0d:93 REACHABLE

[root@cluser-node-1 ~]# tcpdump -ni eth1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:36:17.237182 ARP, Request who-has 10.1.229.188 tell 10.1.229.189, length 28
16:36:17.237369 ARP, Reply 10.1.229.188 is-at 00:07:cb:0b:0d:93, length 46

2 packets captured
4 packets received by filter
0 packets dropped by kernel

This is the other output after system is restarted or after the 15-30 min timeout.

[root@cluser-node-1 ~]# tcpdump -ni eth1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

[root@cluser-node-1 ~]# ip n ls dev eth1
10.1.229.188 lladdr 00:07:cb:0b:0d:93 REACHABLE
Vad1mo
  • 268
  • 2
  • 14
  • I am a little confused with the ping and nmap you are using from other machines as they are the addresses of gateways. Is this a typo? – J.M. Robles Nov 12 '17 at 08:22
  • .187, .189 are the IPs. GWs are .186 and .188 – Vad1mo Nov 12 '17 at 20:20
  • Forgive me, it is something surprising to me because all the boxes I use in test environments (kitchen vagrant) have at least 2 interfaces (in fact a radios simulator has 8) and processes inside are bent to one of them without problems (radio1 is hearing at eht1, radio2 at eth2, and so on). The only difference is that I do not specify different gateways in the same segment. – J.M. Robles Nov 13 '17 at 05:03
  • @J.M.Robles, The hosting provider has an odd setup. – Vad1mo Nov 13 '17 at 16:01
  • You may be over thinking things. It may be easier if people knew what application X and Y are. It looks like one of them is SSH, due to you testing TCP port 22. It may be easier to route based on port(s) used. – BeowulfNode42 Nov 27 '17 at 08:06
  • I don't know what application is running, it should be a node in a cluster. – Vad1mo Nov 28 '17 at 07:29

1 Answers1

1

Check, there are replies (maybe replies are going out through other interface) or replies are missing.

Check the settings of the reverse path filter (check counters in the output of 'nstat -az' or 'netstat -S' - there is TcpExtIPReversePathFilter for packets dropped by rp_filter). Disable it or set in loose mode (see sysctl settins description). Lookup the reverse route for incoming packets to conmirm the assumption.

I think you should add routes for directly connected networks into route tables, because it required by arp resolve of corresponded gateways and for communication to other hosts in directly connected networks. These settings should be enough to solve your case:

ip route add 10.5.68.186/31 dev eth0 table 1
ip route 0/0 via 10.5.68.186 dev eth0 table 1

ip route add 10.5.68.188/31 dev eth1 table 2
ip route 0/0 via 10.5.68.188 dev eth1 table 2

ip rule add from 10.5.68.187 lookup 1
ip rule add from 10.5.68.189 lookup 2

Also, you should know, what this setup is only for case, where the ip addresses on these interfaces with overlapsed addressing is different. Otherwise you should use more complex scheme with CONNMARK and pbr by firewall marks.

If you're trying to ping the host from host itsels, you should use these commands:

ip route add local 10.5.68.187 dev eth0 table 1
ip route add 10.5.68.186/31 dev eth0 table 1
ip route 0/0 via 10.5.68.186 dev eth0 table 1

ip route add local 10.5.68.189 dev eth1 table 2
ip route add 10.5.68.188/31 dev eth1 table 2
ip route 0/0 via 10.5.68.188 dev eth1 table 2

ip rule add iif eth0 lookup 1 pref 101
ip rule add iif eth1 lookup 2 pref 102

ip rule add from 10.5.68.187 lookup 1 pref 201
ip rule add from 10.5.68.189 lookup 2 pref 202

ip rule add from all lookup local pref 300
ip rule del pref 0
Anton Danilov
  • 4,874
  • 2
  • 11
  • 20
  • nstat -az | grep TcpExtIPReversePathFilter TcpExtIPReversePathFilter 0 0.0 I did also `tcpdump -i eth1` and there is nothing. – Vad1mo Nov 22 '17 at 22:43
  • does ip route tables need to be enabled in the kernel specifically?. In older docs I saw something about it but not sure how it is in up to date version. And Yes what is the kernel module? – Vad1mo Nov 22 '17 at 22:45
  • from the node under test I did ping another host `ping -I eth1 10.5.68.187` in the network via eth1 and its quite as well here. – Vad1mo Nov 22 '17 at 23:58
  • The multiple routing tables feature can be enabled with CONFIG_IP_ADVANCED_ROUTER=y kernel option. It enabled in all modern distros and vanila kernel. – Anton Danilov Nov 23 '17 at 13:22
  • Also, in the older ping tool there is some strange behaviour. If you've specified in the -I option interface name, the ping tool uses the raw socket and builds the packet at lower level, and sends it through interface, skipping the routing subsystem. Try to specify in -I option the interface ip address, not the interface name. Maybe this has been fixed, but I don't know. – Anton Danilov Nov 23 '17 at 13:54
  • CONFIG_IP_ADVANCED_ROUTER is set to y in my case so this feature is enabled..\ ping -I 10.1.229.189 10.5.68.187 \ PING 10.5.68.187 (10.5.68.187) from 10.1.229.189 : 56(84) bytes of data.\ From 10.1.229.189 icmp_seq=1 Destination Host Unreachable \ so I can't reach other machines and other machines can't reach my service – Vad1mo Nov 23 '17 at 17:39
  • If I've understood correctly, this packets should be send through eth1 interface. Show the uncut output of 'ip -4 r ls table all' and 'ip rule list'. This error message means, that the suitable route for destination isn't found or the resolve of mac address is failed. Check the arp table on the eth1 interface with 'ip n ls dev eth1' (in normal case you should see REACHABLE arp entry), then check the arp packets with 'tcpdump -ni eth1 arp' command. – Anton Danilov Nov 23 '17 at 18:26
  • I had a little progress, and documented it in -update 4- in the question as it's more practical to do it there then in the comment here. – Vad1mo Nov 24 '17 at 16:43
  • I've written the solution (see the routes and rules). Use it as exactly written. If you're trying to ping the host itself from the same host through other interface, then you don't know, what are you doing - it requires more complex scheme. Try to check the reachability from other host, not from host itself! – Anton Danilov Nov 27 '17 at 07:47
  • I am not trying to ping the host from the itself on the other interface. Its always a node1<->node2 communication. Or more precisely node1->node2.eth1 or node2->node1.eth1. However your first recommendation already worked, but only under the restriction pointed out in update 4. – Vad1mo Nov 28 '17 at 07:13
  • I've setup the testing stand and check the rules in the answer (I've edited it). It works as expected. Try it. – Anton Danilov Nov 28 '17 at 07:48
  • Unfortunately both of the solutions doesn't work. after some time the eth1 isn't reachable anymore until I repeat what is described in update 4. – Vad1mo Nov 30 '17 at 17:21
  • Can you check the arp table and the arp traffic on Node2, when the issue will has repeated? What kind of error you see on Node2? – Anton Danilov Nov 30 '17 at 19:36
  • What I see it that eth1 becomes STALE and not ARP packages are received. – Vad1mo Dec 05 '17 at 14:47
  • Can you dump the traffic from both ends of issued link? Seems like there is some other problem, not related with the original question. – Anton Danilov Dec 06 '17 at 15:54