1

I've a nasty problem with ssh connections between hosts, which are connected via multiple ways (routes). To explain this in detail ...

<<< figure of network and ssh connection >>>

As you can see, there are two possible ways between the hosts, that the packets can travel (green and red line). And if I say they can travel, they can! ;-) There are no firewall rules (or nat) in place on the router, just plain packet forwarding.

What now happens is, that if I establish a ssh connection from host A to host B through the router (or vice-versa), as this is the intended way (not the direct connection on the same network; the ssh-server listens only on the other interface), that this connection dies in about a few seconds, but only if I'm idling around. I tried the several keepalive options on the ssh-server (and client), but now I can say, that's neither the problem nor a solution.

As I digged a little bit deeper, I realized, that this problem must have something to do with the multiple interfaces and routes on both hosts - it's the only situation in which it comes to these phenomenon; but reproducible on other systems as well (if they share the same if-setup).

So I took a few traces an saw some ssh traffic on both hosts traveling through the interfaces, which share the same network (not through the router, as intended).

What I'm experiencing as well is, that if I ssh from host A to B (remember, the only interface on which ssh listens is the one, that is connected to the router) and take down the interface on the shared network, the ssh connection dies immediately!

My assumption is, that the later ssh traffic uses another way than the initial connect. Maybe both ssh instances (client/server) "see" that there is a common network between them, so why don't use it (of course this "direct" connection has a much higher preference in the routing table)?!

I tried to block the ssh traffic on the hosts directly with packet filtering, but being faced with the same timeouts. The only solution, that works is to take down the interface to the shared network; that helps immediately and the connection "idles" for a long time.

Anyone with a good idea?!

Thanks a lot! :-)

-- ADDITIONAL INFORMATION AS REQUESTED IN COMMENTS --

All of the following output was generated at "host B" (the ssh "target").

"host A" is on the "192.168.110.0/24"-subnet!

"ifconfig -a" (irrelevant interfaces stripped):

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
  options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
  ether 00:00:00:00:00:00
  inet 192.168.100.5 netmask 0xffffff00 broadcast 192.168.100.255
  media: Ethernet autoselect (1000baseT <full-duplex>)
  status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
  options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
  ether 00:00:00:00:00:00
  inet 192.168.110.5 netmask 0xffffff00 broadcast 192.168.110.255
  media: Ethernet autoselect (1000baseT <full-duplex>)
  status: active

"netstat -rna" (irrelevant routes stripped (interfaces)):

Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            192.168.100.1      UGS         0      807    em0
127.0.0.1          link#9             UH          0        0    lo0
192.168.100.0/24   link#1             U           0   113430    em0
192.168.100.5      link#1             UHS         0    10437    lo0
192.168.110.0/24   link#2             U           0      319    em1
192.168.110.5      link#2             UHS         0        0    lo0
(...)

"sockstat -l" (kept other processes for completeness):

USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
dhcpd    dhcpd      1416  10 udp4   *:67                  *:*
dhcpd    dhcpd      1416  20 udp4   *:58917               *:*
dhcpd    dhcpd      1416  21 udp6   *:33125               *:*
mysql    mysqld     1629  10 tcp4   192.168.100.5:3306    *:*
root     apcupsd    1353  4  udp4   *:18755               *:*
root     apcupsd    1353  5  udp4   *:162                 *:*
root     apcupsd    1353  7  tcp4   192.168.100.5:3551    *:*
root     collectd   1635  10 udp4   *:65262               *:*
root     collectd   1635  11 udp4   *:49993               *:*
root     collectd   1635  12 udp4   *:51224               *:*
root     collectd   1635  13 udp4   *:58446               *:*
root     collectd   1635  4  udp4   192.168.100.5:25826   *:*
root     collectd   1635  7  udp4   *:16430               *:*
root     collectd   1635  8  udp4   *:12406               *:*
root     collectd   1635  9  udp4   *:16113               *:*
root     inetd      1676  5  udp4   *:69                  *:*
root     monit      1358  7  tcp4   127.0.0.1:2812        *:*
root     sshd       1656  3  tcp4   192.168.100.5:22      *:*
root     syslog-ng  1295  10 dgram  /var/run/logpriv
root     syslog-ng  1295  12 tcp4   192.168.100.5:514     *:*
root     syslog-ng  1295  13 udp4   192.168.100.5:514     *:*
root     syslog-ng  1295  14 tcp4   192.168.100.5:601     *:*
root     syslog-ng  1295  9  dgram  /var/run/log
_ntp     ntpd       1425  6  udp4   192.168.100.5:123     *:*
Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
codepoet
  • 11
  • 2
  • 1
    could you provide us with some output of "netstat -tulpen", "netstat -r, and "iptables -L -n -v" (if Linux)? – Daywalker Aug 22 '13 at 11:14
  • Your diagram confuses things, are the two hosts on the same ethernet segment and subnet or is there a router between them? (if you have two interfaces on each one on the same segment and one routed, it will depend on the destination ip you use as to which applies.) – Aaron Tate Aug 22 '13 at 11:21
  • @fenix: You're absolutely right; I changed the image in hope to clear things up. Unfortunately I'm not able to post that picture inline, cause my reputation is only 1 ... ;-) – codepoet Aug 22 '13 at 12:55
  • @daywalker: It's FreeBSD and the routing table is straight forward; only routes to the network segments via their respective interfaces. – codepoet Aug 22 '13 at 12:56
  • @StefanZimmermann Then please provide some other informations like: "ifconfig", "sockstat -l", "pfctl -s all" and "netstat -rn" – Daywalker Aug 22 '13 at 13:20
  • @Daywalker I provided your requested information (see above). "pfctl" didn't do anything, cause it's not loaded (the kernel module). Again: There are NO packet filters in place; wether on the ssh-host nor on the router. – codepoet Aug 22 '13 at 14:50
  • @StefanZimmermann could you also provide the routing Information from Host A? I think of something similar as user Dru. After the arp timeout the connection tries or finds another way, and is not longer matching on one side of the connection. So we would neet to figure out WHERE this happens. – Daywalker Aug 23 '13 at 06:24
  • @StefanZimmermann Additional: Is there any good reason for this setup? I just cant figure out a way why SSH could not just listen ob both IPs? – Daywalker Aug 23 '13 at 06:25

1 Answers1

0

As soon as you connect to B, it adds an ARP for host A. It uses the local subnet after that, but, as soon as the ARP times out ~300s or five minutes, it broadcasts for your Ethernet address. The router doesn't forward the broadcast unless it's acting as a bridge, which, judging from a sort of kerplooie net config, I assume to be not so.

You could try adding a static ARP entry for your host to host B's ARP table, or just manually, on the command line.

Then, if you would, could you explain why you have a full-duplex Ge running in simplex mode? Also, why does it say "ether 00:00:00:00:00:00"? Did you black it out (it's a little confusing)?

Dru
  • 101