1

I have a few hosts connected to the same switch, which are all on the same subnet (10.0.0.0/16). Two of these hosts have faster network interfaces so I have connected them together, meaning these two machines now have a direct link with each other without going through a switch.

I now need to set up the routing such that when these two machines try to talk to each other, the packets go over this faster direct link in preference to the slower link via the switch.

The easiest way would be to configure the direct link to be on a different subnet, however then I will need to use different IPs or hostnames depending on which interface to use, and as I would like to be able to deploy standard configs to all machines (e.g. NFS mounts using hostnames) and not have to maintain custom IP overrides in /etc/hosts, I feel this solution would be too easy to get a hostname wrong and send traffic over the wrong interface.

What I am looking for is a way to tell the two Linux machines that even though eth0 handles 10.0.0.0/16, when you want to communicate with 10.0.0.5, even though it's in eth0's subnet, send the packets through eth1 instead.

I tried adding a host routing rule with route add -host 10.0.0.5 dev eth1 which does send the packet out on the correct interface, however it comes from the wrong IP address (the direct link's subnet rather than the original subnet.)

I guess the only way to fix this is to set the same IP address on both interfaces, but will this cause any problems? Can a machine correctly have the same IP on multiple NICs without causing problems? I'm assuming I'll need to set routing metrics properly so that the NIC connected to the switch is given priority (to avoid all traffic for the subnet being sent to the other host by mistake), but is there anything else I need to be aware of with this set up? Can it lead to any other issues or difficult-to-resolve problems?

Or is there a better, more robust way to achieve this?

EDIT: Here is the extra into requested by @A.B:

$ ip -br link
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
eth0             UP             ec:f4:xx:xx:xx:a4 <BROADCAST,MULTICAST,UP,LOWER_UP> 
eth1             UP             ec:f4:xx:xx:xx:a5 <BROADCAST,MULTICAST,UP,LOWER_UP> 

$ ip -br address
lo               UNKNOWN        127.0.0.1/8
eth0             UP             10.0.0.4/16
eth1             UP             10.0.99.4/24

$ ip route
default via 10.0.0.1 dev eth0 proto static 
10.0.0.0/16 dev eth0 proto kernel scope link src 10.0.0.4
10.0.0.5 dev eth1 scope link 
10.0.99.0/24 dev eth1 proto kernel scope link src 10.0.99.4

I set up the direct link (eth1) on a separate subnet and then tried the route command in my original post, and this is where things are now. It looks like perhaps I need to get the src attribute set for my direct route.

Malvineous
  • 955
  • 7
  • 27
  • 1
    You are trying to do something that breaks a lot of rules. If it us doable it is likely a lot more work then a couple of entries in a hosts file. I expect having the same IP on 2 interfaces will break a lot of stuff (confused ARP table). You might be able to bodge something together using iptables nat on both devices but the cure is worse then the problem. – davidgo May 23 '20 at 20:55
  • Could you anyway give more informations on these hosts with the current (not well working) configuration? Could you add the output of `ip -br link; ip -br address; ip route`. Note that on Linux `route` is obsolete, and some features require the use of `ip route` because `route` can't express them. – A.B May 23 '20 at 21:24

2 Answers2

2

As 10.0.99.4 is part of 10.0.0.0/16 this IP address should be avoided. Else there would be a conflict with the actual 10.0.99.4/16 address, even on eth0 considering Linux is using the Weak Host Model and would answer by default to ARP requests for this IP address also on eth0 for 10.0.99.4, creating ARP conflicts. Don't use conflicting IP addresses.

cleanup:

ip route flush dev eth1
ip address flush dev eth1

The standard method with glue IP addresses

Let's choose two unrelated addresses to be used by the two hosts. They have to not clash with anything else in use on your network, but as they are point-to-point /32 addresses, anything can do, they won't be used as part of a LAN, but only as point-to-point/peer addresses. I'll arbitrarily use 192.168.100.4/32 and 192.168.101.5/32. Should later more than 2 of those hosts inherit a faster switch and are connected together using this separate switch, this can be slightly amended and having related IP addresses in the same block is then again easier.

configuration for host 10.0.0.4:

# /32 : no route gets created (beside the hidden *local* routing table)
ip address add 192.168.100.4/32 dev eth1 
# add the peer (point to point) route on the same link
ip route add 192.168.101.5/32 dev eth1

Actually the two command above have a shortcut, you can replace both of them with this single command below:

ip address add 192.168.100.4 peer 192.168.101.5/32 dev eth1

Now tell the host that to reach 10.0.0.5/32 (which is more specific than 10.0.0.0/16) there's a route using the peer IP address, but preferring a different source IP address than what would be chosen by default (the obsolete route command can't do this):

ip route add 10.0.0.5/32 via 192.168.101.5 dev eth1 src 10.0.0.4

With this in place you get:

# ip route get 10.0.0.5
10.0.0.5 via 192.168.101.5 dev eth1 src 10.0.0.4 uid 0 
    cache 

There's one minor drawback: IP broadcasts are still sent to eth0 and if Strict Reverse Path Forwarding is active (either sysctl net.ipv4.conf.eth0.rp_filter or sysctl net.ipv4.conf.all.rp_filter gives 1 rather than 0 or 2) those broadcasts, when sent by the peer (eg running on peer host 10.0.0.5 something similar to echo test | socat udp4-datagram:10.0.255.255:5555,broadcast -) will be ignored because received on the now wrong interface. So if you are using protocols relying on this and already apply a Strict Reverse Path Forwarding, switch eth0 to Loose mode if needed:

sysctl -w net.ipv4.conf.eth0.rp_filter=2

The equivalent configuration for host 10.0.0.5:

ip address add 192.168.101.5 peer 192.168.100.4/32 dev eth1
ip route add 10.0.0.4/32 via 192.168.100.4 dev eth1 src 10.0.0.5
sysctl -w net.ipv4.conf.eth0.rp_filter=2

For example on Debian-like ifupdown interface configuration files you can use the pointopoint keyword and a few up additional commands for any command that doesn't have a direction configuration equivalent. (sysctl would rather be put in /etc/sysctl.d).

Simplified method without additional (nor duplicate) IP addresses

Actually the only role of 192.168.100.4 and 192.168.100.5 is to resolve link layer addresses to know the route for 10.0.0.4 and 10.0.0.5: there are used as some kind of glue that doesn't play any other role. Those IP addresses will be completely invisible, and no IP packet will ever use 192.168.100.4 or 192.168.100.5 in their content (except if explicitly using those), only ARP requests and answers will. There's no need to use such glue IP addresses at all.

For example the host provider Hetzner gives an example:

ip route add 203.0.113.40/32 dev tap0

to reach an IP address on an interface without configuring an IP address on this interface (nor having this interface used as bridge port). In this example the peer on tap0 (which is a tun/tap device in Ethernet mode linked to a VM on the other side) has to answer ARP requests to resolve link-layer addresses.

But then again for symetrical reasons it doesn't need an IP address configured there either if it already configured it elsewhere, to answer properly an ARP request done through eth1: that's again part of Linux' implementation of the Weak Host Model.

So this can simply be used for host 10.0.0.4, without involving any extra IP address using only a single command:

ip route add 10.0.0.5/32 dev eth1

Or to specify the source (to avoid ambiguity in case the host has more than one):

ip route add 10.0.0.5/32 dev eth1 src 10.0.0.4

And for host 10.0.0.5:

ip route add 10.0.0.4/32 dev eth1 src 10.0.0.5

For accepting "slow" broadcasts on eth0 from the peer, they still require as before:

sysctl -w net.ipv4.conf.eth0.rp_filter=2

ARP requests to resolve their IP addresses can be answered on both interfaces (as linked above Linux does this by default), but here resolution or entries on the usual (old) side eth0 if any(eg: before those settings are put in place) won't trigger effects such as ARP flux because both peers are configured together to use eth1 leaving no other possible interpretation for the routes.


Choose what method you prefer. The first is more classical, the second has a simpler setup (but you might get a few "this can't work" from your peers). Remember that manually added routes are lost when an interface is administratively put down then up, so those settings must be put in an adequate network configuration setting to stay properly in effect.

A.B
  • 9,037
  • 2
  • 19
  • 37
0

Using A.B's excellent explanation, this is what I ended up doing to get it to work, for the benefit of anyone else using systemd:

# /etc/systemd/network/50-eth1.network
[Match]
Name=eth1   # NIC with the peer on it to take priority over the normal NIC

[Network]
# For IPv6 it only appears to work when setting the same address on both interfaces
Address=10::5/64          # Same address as on the main NIC
DefaultRouteOnDevice=false

[Route]
Scope=link
PreferredSource=10.0.0.5  # Host IP address from the main NIC
Destination=10.0.0.4/32   # Peer's IP address on the other end of the PtP link

[Route]
Scope=link
PreferredSource=10::5     # Host IPv6 address from the main NIC
Destination=10::4/128     # Peer's IPv6 address

Of course the source and destination IP addresses (but not the netmasks) are flipped for the other host.

Malvineous
  • 955
  • 7
  • 27