I've a nasty problem with ssh connections between hosts, which are connected via multiple ways (routes). To explain this in detail ...
As you can see, there are two possible ways between the hosts, that the packets can travel (green and red line). And if I say they can travel, they can! ;-) There are no firewall rules (or nat) in place on the router, just plain packet forwarding.
What now happens is, that if I establish a ssh connection from host A to host B through the router (or vice-versa), as this is the intended way (not the direct connection on the same network; the ssh-server listens only on the other interface), that this connection dies in about a few seconds, but only if I'm idling around. I tried the several keepalive options on the ssh-server (and client), but now I can say, that's neither the problem nor a solution.
As I digged a little bit deeper, I realized, that this problem must have something to do with the multiple interfaces and routes on both hosts - it's the only situation in which it comes to these phenomenon; but reproducible on other systems as well (if they share the same if-setup).
So I took a few traces an saw some ssh traffic on both hosts traveling through the interfaces, which share the same network (not through the router, as intended).
What I'm experiencing as well is, that if I ssh from host A to B (remember, the only interface on which ssh listens is the one, that is connected to the router) and take down the interface on the shared network, the ssh connection dies immediately!
My assumption is, that the later ssh traffic uses another way than the initial connect. Maybe both ssh instances (client/server) "see" that there is a common network between them, so why don't use it (of course this "direct" connection has a much higher preference in the routing table)?!
I tried to block the ssh traffic on the hosts directly with packet filtering, but being faced with the same timeouts. The only solution, that works is to take down the interface to the shared network; that helps immediately and the connection "idles" for a long time.
Anyone with a good idea?!
Thanks a lot! :-)
-- ADDITIONAL INFORMATION AS REQUESTED IN COMMENTS --
All of the following output was generated at "host B" (the ssh "target").
"host A" is on the "192.168.110.0/24"-subnet!
"ifconfig -a" (irrelevant interfaces stripped):
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:00:00:00:00:00
inet 192.168.100.5 netmask 0xffffff00 broadcast 192.168.100.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
ether 00:00:00:00:00:00
inet 192.168.110.5 netmask 0xffffff00 broadcast 192.168.110.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
"netstat -rna" (irrelevant routes stripped (interfaces)):
Routing tables
Internet:
Destination Gateway Flags Refs Use Netif Expire
default 192.168.100.1 UGS 0 807 em0
127.0.0.1 link#9 UH 0 0 lo0
192.168.100.0/24 link#1 U 0 113430 em0
192.168.100.5 link#1 UHS 0 10437 lo0
192.168.110.0/24 link#2 U 0 319 em1
192.168.110.5 link#2 UHS 0 0 lo0
(...)
"sockstat -l" (kept other processes for completeness):
USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS
dhcpd dhcpd 1416 10 udp4 *:67 *:*
dhcpd dhcpd 1416 20 udp4 *:58917 *:*
dhcpd dhcpd 1416 21 udp6 *:33125 *:*
mysql mysqld 1629 10 tcp4 192.168.100.5:3306 *:*
root apcupsd 1353 4 udp4 *:18755 *:*
root apcupsd 1353 5 udp4 *:162 *:*
root apcupsd 1353 7 tcp4 192.168.100.5:3551 *:*
root collectd 1635 10 udp4 *:65262 *:*
root collectd 1635 11 udp4 *:49993 *:*
root collectd 1635 12 udp4 *:51224 *:*
root collectd 1635 13 udp4 *:58446 *:*
root collectd 1635 4 udp4 192.168.100.5:25826 *:*
root collectd 1635 7 udp4 *:16430 *:*
root collectd 1635 8 udp4 *:12406 *:*
root collectd 1635 9 udp4 *:16113 *:*
root inetd 1676 5 udp4 *:69 *:*
root monit 1358 7 tcp4 127.0.0.1:2812 *:*
root sshd 1656 3 tcp4 192.168.100.5:22 *:*
root syslog-ng 1295 10 dgram /var/run/logpriv
root syslog-ng 1295 12 tcp4 192.168.100.5:514 *:*
root syslog-ng 1295 13 udp4 192.168.100.5:514 *:*
root syslog-ng 1295 14 tcp4 192.168.100.5:601 *:*
root syslog-ng 1295 9 dgram /var/run/log
_ntp ntpd 1425 6 udp4 192.168.100.5:123 *:*