I just simulated you scenario / need with three VMs and two (independent) bridges on a VM host and formulated / tested a solution (which is what I mentioned in my comment) for it.
The VM host acts as the web server, and two of the VMs act as the routers, one of the VM act as a web client from "the Internet":
Configurations on the VM host (web server):
$ ip a show dev bridge1
4: bridge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 3a:f6:7b:90:aa:bd brd ff:ff:ff:ff:ff:ff
inet 192.168.254.3/24 scope global bridge1
valid_lft forever preferred_lft forever
inet6 fe80::38f6:7bff:fe90:aabd/64 scope link
valid_lft forever preferred_lft forever
$ ip a show dev bridge2
5: bridge2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 1a:6f:58:86:72:55 brd ff:ff:ff:ff:ff:ff
inet6 fe80::186f:58ff:fe86:7255/64 scope link
valid_lft forever preferred_lft forever
(bridge1
is used for simulating the LAN, and bridge2
is used for simulating "the Internet", so the latter isn't assigned an IPv4 address.)
$ ip rule
0: from all lookup local
32765: from all fwmark 0xb iif lo lookup 11
32766: from all lookup main
32767: from all lookup default
$ ip r show table main dev bridge1
10.10.10.0/24 via 192.168.254.1
192.168.254.0/24 proto kernel scope link src 192.168.254.3
$ ip r show table 11
10.10.10.0/24 via 192.168.254.2 dev bridge1
(Here 192.168.254.1
is assumed to be the "primary" default gateway. iif lo
is a refinement that causes the rule to only be applied on traffics that originated from the host itself, in other words it is probably unnecessary unless the web server host is also acting as some sort of router.)
$ sudo nft list ruleset
table ip mangle {
chain input {
type filter hook input priority mangle; policy accept;
ether saddr 52:54:00:bb:bb:bb ip saddr != 192.168.254.2 ct mark set 0x0000000b
}
chain output {
type route hook output priority mangle; policy accept;
ct mark 0x0000000b meta mark set ct mark
}
}
(Apparently type
must be route
in the hook output
chain for this to work. Also, traffics originated from the routers, unlike traffics originated from "the Internet", can be differentiated as per their source IP addresses, so ip saddr != 192.168.254.2
is specified to indicate the fact; in reality it's probably an unnecessary refinement.)
Here's the tcpdump
capture on the VM host / web server of the two curl
run done on the web client VM:
$ sudo tcpdump -eni bridge1 tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on bridge1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:54:43.602105 52:54:00:aa:aa:aa > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 74: 10.10.10.3.33132 > 192.168.254.3.80: Flags [S], seq 2320058647, win 64240, options [mss 1460,sackOK,TS val 3412464375 ecr 0,nop,wscale 7], length 0
16:54:43.602185 3a:f6:7b:90:aa:bd > 52:54:00:aa:aa:aa, ethertype IPv4 (0x0800), length 74: 192.168.254.3.80 > 10.10.10.3.33132: Flags [S.], seq 3987984937, ack 2320058648, win 65160, options [mss 1460,sackOK,TS val 3768307023 ecr 3412464375,nop,wscale 7], length 0
16:54:43.603460 52:54:00:aa:aa:aa > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.33132 > 192.168.254.3.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 3412464377 ecr 3768307023], length 0
16:54:43.604003 52:54:00:aa:aa:aa > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 140: 10.10.10.3.33132 > 192.168.254.3.80: Flags [P.], seq 1:75, ack 1, win 502, options [nop,nop,TS val 3412464377 ecr 3768307023], length 74: HTTP: GET / HTTP/1.1
16:54:43.604054 3a:f6:7b:90:aa:bd > 52:54:00:aa:aa:aa, ethertype IPv4 (0x0800), length 66: 192.168.254.3.80 > 10.10.10.3.33132: Flags [.], ack 75, win 509, options [nop,nop,TS val 3768307025 ecr 3412464377], length 0
16:54:43.604238 3a:f6:7b:90:aa:bd > 52:54:00:aa:aa:aa, ethertype IPv4 (0x0800), length 329: 192.168.254.3.80 > 10.10.10.3.33132: Flags [P.], seq 1:264, ack 75, win 509, options [nop,nop,TS val 3768307025 ecr 3412464377], length 263: HTTP: HTTP/1.1 200 OK
16:54:43.604636 52:54:00:aa:aa:aa > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.33132 > 192.168.254.3.80: Flags [.], ack 264, win 501, options [nop,nop,TS val 3412464378 ecr 3768307025], length 0
16:54:43.605112 52:54:00:aa:aa:aa > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.33132 > 192.168.254.3.80: Flags [F.], seq 75, ack 264, win 501, options [nop,nop,TS val 3412464379 ecr 3768307025], length 0
16:54:43.605133 3a:f6:7b:90:aa:bd > 52:54:00:aa:aa:aa, ethertype IPv4 (0x0800), length 66: 192.168.254.3.80 > 10.10.10.3.33132: Flags [F.], seq 264, ack 76, win 509, options [nop,nop,TS val 3768307026 ecr 3412464379], length 0
16:54:43.605270 52:54:00:aa:aa:aa > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.33132 > 192.168.254.3.80: Flags [.], ack 265, win 501, options [nop,nop,TS val 3412464379 ecr 3768307026], length 0
16:54:47.528893 52:54:00:bb:bb:bb > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 74: 10.10.10.3.49270 > 192.168.254.3.80: Flags [S], seq 1866708541, win 64240, options [mss 1460,sackOK,TS val 1196345946 ecr 0,nop,wscale 7], length 0
16:54:47.528977 3a:f6:7b:90:aa:bd > 52:54:00:bb:bb:bb, ethertype IPv4 (0x0800), length 74: 192.168.254.3.80 > 10.10.10.3.49270: Flags [S.], seq 1756841838, ack 1866708542, win 65160, options [mss 1460,sackOK,TS val 3768310949 ecr 1196345946,nop,wscale 7], length 0
16:54:47.530210 52:54:00:bb:bb:bb > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.49270 > 192.168.254.3.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1196345947 ecr 3768310949], length 0
16:54:47.530535 52:54:00:bb:bb:bb > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 140: 10.10.10.3.49270 > 192.168.254.3.80: Flags [P.], seq 1:75, ack 1, win 502, options [nop,nop,TS val 1196345948 ecr 3768310949], length 74: HTTP: GET / HTTP/1.1
16:54:47.530588 3a:f6:7b:90:aa:bd > 52:54:00:bb:bb:bb, ethertype IPv4 (0x0800), length 66: 192.168.254.3.80 > 10.10.10.3.49270: Flags [.], ack 75, win 509, options [nop,nop,TS val 3768310951 ecr 1196345948], length 0
16:54:47.530744 3a:f6:7b:90:aa:bd > 52:54:00:bb:bb:bb, ethertype IPv4 (0x0800), length 329: 192.168.254.3.80 > 10.10.10.3.49270: Flags [P.], seq 1:264, ack 75, win 509, options [nop,nop,TS val 3768310951 ecr 1196345948], length 263: HTTP: HTTP/1.1 200 OK
16:54:47.531434 52:54:00:bb:bb:bb > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.49270 > 192.168.254.3.80: Flags [.], ack 264, win 501, options [nop,nop,TS val 1196345949 ecr 3768310951], length 0
16:54:47.532994 52:54:00:bb:bb:bb > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.49270 > 192.168.254.3.80: Flags [F.], seq 75, ack 264, win 501, options [nop,nop,TS val 1196345951 ecr 3768310951], length 0
16:54:47.533092 3a:f6:7b:90:aa:bd > 52:54:00:bb:bb:bb, ethertype IPv4 (0x0800), length 66: 192.168.254.3.80 > 10.10.10.3.49270: Flags [F.], seq 264, ack 76, win 509, options [nop,nop,TS val 3768310954 ecr 1196345951], length 0
16:54:47.533925 52:54:00:bb:bb:bb > 3a:f6:7b:90:aa:bd, ethertype IPv4 (0x0800), length 66: 10.10.10.3.49270 > 192.168.254.3.80: Flags [.], ack 265, win 501, options [nop,nop,TS val 1196345951 ecr 3768310954], length 0
^C
20 packets captured
20 packets received by filter
0 packets dropped by kernel
As you can see, the destination MAC addresses of the replies match with the source MAC addresses of the corresponding original traffics, which means they were being send to the router that the respective original traffics came from, even when the IP addresses are identical. (Also, as shown in the screenshot, both run successfully fetched the target web page.)
Rationale of the nftable ruleset
The ct mark
setting in the hook input
chain will cause the mark to be set for all traffics of the same "connection". (I am not / cannot really go deep into that but if you want to know more about it, research about "conntrack".) Therefore, in the hook output
chain you can "select" the corresponding replies with the ct mark
matching, and perform meta mark set ct mark
on them, which means to set a meta mark
on the replies of the same value as the ct mark
(i.e. 0xb
, which is an arbitrary value btw). (You can set it to a different value instead too.)
meta mark
corresponds to fwmark
in the ip rule and therefore, an extra route table (11
in the example, which is also an arbitrary value) will be looked up for traffics with meta mark
that is equal to the fwmark
in the rule, before (because of the lower priority value) the route table main
is looked up.
Since in route table 11
there's a route for 10.10.10.0/24
with a different nexthop (i.e. via
) from the one in route table main
, the selected replies will be sent to the "correct" router. (No further lookup is performed when there's a route that "covers" the destination address.)
Although 10.10.10.0/24
is used instead of default
a.k.a. 0.0.0.0/0
and the "routers" are connected to the same bridge along side the web client host to simulate the real Internet, it shouldn't prevent the drill from working in the real situation.