Beside various minor issues the issue is about policy routing: one must distinguish the route for an egress packet from the route for an ingress packet with exactly the same content: they use two different paths.
Let's rewrite it completely (using numbers for table values, so it's easily testable without change to files).
Needed for correct ARP handling that will follow policy routing and tie ARP traffic to the correct interface:
sysctl -w net.ipv4.conf.eth0.arp_filter=1
sysctl -w net.ipv4.conf.eth1.arp_filter=1
sysctl -w net.ipv4.conf.eth2.arp_filter=1
Add addresses, following OP's way:
ip address add 192.168.1.124/32 dev eth0
ip address add 192.168.1.125/32 dev eth1
ip address add 192.168.1.126/32 dev eth2
Add LAN routes, one different route per table (OP pasted the same). No need to hint the source (which would be a different address, for each): it's selected by the routing rules that will be added later.
ip route add 192.168.1.0/24 dev eth0 table 124
ip route add 192.168.1.0/24 dev eth1 table 125
ip route add 192.168.1.0/24 dev eth2 table 126
Add gateway routes (now they are reachable within their table):
ip route add default via 192.168.1.11 table 124
ip route add default via 192.168.1.11 table 125
ip route add default via 192.168.1.11 table 126
Policy routing: one must treat differently the two directions of a packet from 192.168.1.124 to 192.168.1.125: when it's emitted through eth0
, and when the very same packet is received through eth1
. iif lo
is the special syntax telling to apply a policy routing rule only for emitted packets (rather than for any case, including received from an interface). Use specific preference (that will be useful for the following step):
ip rule add pref 124 iif lo from 192.168.1.124 lookup 124
ip rule add pref 125 iif lo from 192.168.1.125 lookup 125
ip rule add pref 126 iif lo from 192.168.1.126 lookup 126
Alas the local table with policy routing rule preference 0 prevents these rules to apply. For example, here nothing changed:
$ ip route get from 192.168.1.124 to 192.168.1.125
local 192.168.1.125 from 192.168.1.124 dev lo table local uid 1000
cache <local>
Just move the local table lookup after these rules:
ip rule add pref 200 lookup local
ip rule delete pref 0 lookup local
Which now gives the two paths for the same packet: one for egress and one for ingress (the moved local table still resolves the ingress path):
$ ip route get from 192.168.1.124 to 192.168.1.125
192.168.1.125 from 192.168.1.124 dev eth0 table 124 uid 1000
cache
$ ip route get from 192.168.1.124 iif eth1 to 192.168.1.125
local 192.168.1.125 from 192.168.1.124 dev lo table local
cache <local> iif eth1
Note: arp_filter
will have the correct interface selected for reply (ie: it won't be eth2
for above example, but always eth1
) because only the correct interface will reply to ARP queries.
This can optionally be further enforced with Strict Reverse Path Forwarding usingrp_filter
:
Before (using an unexpected path):
$ ip route get from 192.168.1.124 iif eth2 to 192.168.1.125
local 192.168.1.125 from 192.168.1.124 dev lo table local
cache <local> iif eth2
sysctl -w net.ipv4.conf.eth0.rp_filter=1
sysctl -w net.ipv4.conf.eth1.rp_filter=1
sysctl -w net.ipv4.conf.eth2.rp_filter=1
After:
$ ip route get from 192.168.1.124 iif eth2 to 192.168.1.125
RTNETLINK answers: Invalid cross-device link
A ping from an IP address to the same IP address makes no sense to go over the wire because it would use twice the same interface: from 192.168.1.124 to 192.168.1.124 should use eth0
but a switch (or even a hub) by default will never send back the packet where it came from (that would be hairpin mode), so even the ARP request to resolve the destination will fail whatever configuration is done, before even (failing at) sending or receiving back the IP packet. This case should be put back to the local routing table over the lo
device as initially. Either add 3 more policy rules, or add a "hole" in the routing table using throw so rule evaluation is bumped to the following rules and will thus reach the local routing table:
Before:
$ ip route get from 192.168.1.124 to 192.168.1.124
192.168.1.124 from 192.168.1.124 dev eth0 table 124 uid 1000
cache
ip route add throw 192.168.1.124/32 table 124
ip route add throw 192.168.1.125/32 table 125
ip route add throw 192.168.1.126/32 table 126
After:
$ ip route get from 192.168.1.124 to 192.168.1.124
local 192.168.1.124 from 192.168.1.124 dev lo table local uid 1000
cache <local>
Note that the system is unable to emit a packet without binding the source address first as there's no such route, but that was OP's choice when /32 addresses were used:
$ ip route get 192.168.1.127
RTNETLINK answers: Network is unreachable
$ ip route get from 192.168.1.126 to 192.168.1.127
192.168.1.127 from 192.168.1.126 dev eth2 table 126 uid 1000
cache
One can still add default choices for example in the main routing table if needed. For example if 192.168.1.125 is to be the default:
# ip route add 192.168.1.0/24 dev eth1
# ip route get 192.168.1.127
192.168.1.127 dev eth1 src 192.168.1.125 uid 0
cache
The really default default route too must be specified again in the main table (implicit is fine, eth1
is resolved from the previously added entry):
ip route add default via 192.168.1.11
With these settings, the host can reach from any of its interfaces other systems anywhere, as long as the source is specified (or else it will default to 192.168.1.125 on eth1
), and also, solving OP's goal, can ping itself from one address to an other different one of its addresses over the wire. Simulated using network namespaces:
# ip neigh flush all
# ping -c3 -I 192.168.1.124 192.168.1.125
PING 192.168.1.125 (192.168.1.125) from 192.168.1.124 : 56(84) bytes of data.
64 bytes from 192.168.1.125: icmp_seq=1 ttl=64 time=0.087 ms
64 bytes from 192.168.1.125: icmp_seq=2 ttl=64 time=0.059 ms
64 bytes from 192.168.1.125: icmp_seq=3 ttl=64 time=0.063 ms
--- 192.168.1.125 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2027ms
rtt min/avg/max/mdev = 0.059/0.069/0.087/0.012 ms
Here the first ping was longer because of the ARP request that was needed first. Of course tcpdump
can confirm it (here captures on the involved interfaces):
# tcpdump -ttttt -l -e -n -s0 -p -i eth0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
00:00:00.000000 fe:c7:45:36:a1:84 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.125 tell 192.168.1.124, length 28
00:00:00.000042 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype ARP (0x0806), length 42: Reply 192.168.1.125 is-at ce:59:ff:a1:96:ff, length 28
00:00:00.000045 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype IPv4 (0x0800), length 98: 192.168.1.124 > 192.168.1.125: ICMP echo request, id 59125, seq 1, length 64
00:00:00.000061 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype IPv4 (0x0800), length 98: 192.168.1.125 > 192.168.1.124: ICMP echo reply, id 59125, seq 1, length 64
00:00:01.003101 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype IPv4 (0x0800), length 98: 192.168.1.124 > 192.168.1.125: ICMP echo request, id 59125, seq 2, length 64
00:00:01.003135 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype IPv4 (0x0800), length 98: 192.168.1.125 > 192.168.1.124: ICMP echo reply, id 59125, seq 2, length 64
00:00:02.027155 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype IPv4 (0x0800), length 98: 192.168.1.124 > 192.168.1.125: ICMP echo request, id 59125, seq 3, length 64
00:00:02.027190 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype IPv4 (0x0800), length 98: 192.168.1.125 > 192.168.1.124: ICMP echo reply, id 59125, seq 3, length 64
00:00:05.099215 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.124 tell 192.168.1.125, length 28
00:00:05.099248 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype ARP (0x0806), length 42: Reply 192.168.1.124 is-at fe:c7:45:36:a1:84, length 28
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel
# tcpdump -ttttt -l -e -n -s0 -p -i eth1
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
00:00:00.000000 fe:c7:45:36:a1:84 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.125 tell 192.168.1.124, length 28
00:00:00.000022 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype ARP (0x0806), length 42: Reply 192.168.1.125 is-at ce:59:ff:a1:96:ff, length 28
00:00:00.000031 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype IPv4 (0x0800), length 98: 192.168.1.124 > 192.168.1.125: ICMP echo request, id 59125, seq 1, length 64
00:00:00.000041 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype IPv4 (0x0800), length 98: 192.168.1.125 > 192.168.1.124: ICMP echo reply, id 59125, seq 1, length 64
00:00:01.003098 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype IPv4 (0x0800), length 98: 192.168.1.124 > 192.168.1.125: ICMP echo request, id 59125, seq 2, length 64
00:00:01.003113 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype IPv4 (0x0800), length 98: 192.168.1.125 > 192.168.1.124: ICMP echo reply, id 59125, seq 2, length 64
00:00:02.027154 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype IPv4 (0x0800), length 98: 192.168.1.124 > 192.168.1.125: ICMP echo request, id 59125, seq 3, length 64
00:00:02.027170 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype IPv4 (0x0800), length 98: 192.168.1.125 > 192.168.1.124: ICMP echo reply, id 59125, seq 3, length 64
00:00:05.099052 ce:59:ff:a1:96:ff > fe:c7:45:36:a1:84, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.124 tell 192.168.1.125, length 28
00:00:05.099241 fe:c7:45:36:a1:84 > ce:59:ff:a1:96:ff, ethertype ARP (0x0806), length 42: Reply 192.168.1.124 is-at fe:c7:45:36:a1:84, length 28
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel