3

I tested DNS round-robin with SSH, and I noticed amazing results of the SSH client on my testing environment. I am using 3 nodes with RHEL 6.2 (openssh-5.3p1, bind-9.7.3-8.P3). Things like the host keys have been managed.

My "issue":

I would like a rudimentary kind of load balancing between several SSH servers, using multiple DNS entries. I was (almost) sure that was possible. But I got a rudimentary kind of HA... It seems that the openssh client does not care of the round-robin, it always connects to the same node, except if it is down, in this last case the client uses another record from the list of DNS entries, and then connect to it with success. Is it the normal/common behavior?? Or what is wrong in my tests?

I put my straces and tcpdumps of what happens in several cases. Thanks in advance if you have any idea or explanation which can help :)

login => 10.255.254.1 (node0), 10.255.254.3 (node2) ssh client => 10.255.254.2 (node1)

DNS server on node0, RR has not been disabled.

login IN A 10.255.254.1
login IN A 10.255.254.3

I confirm that:

  • the lookup with host(1) confirms the Round-Robin ;
  • the ping(1) command looks good :

[root@node1 ~]# ping login

PING login.node (10.255.254.3) 56(84) bytes of data.
64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=1.73 ms
^C
[root@node1 ~]# ping login
PING login.node (10.255.254.1) 56(84) bytes of data.
64 bytes from node0.node (10.255.254.1): icmp_seq=1 ttl=64 time=0.467 ms
^C
[root@node1 ~]# ping login
PING login.node (10.255.254.3) 56(84) bytes of data.
64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=0.433 ms
^C

TEST 1 (both SSH servers are up and reachable)

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
(...)

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:03:04.875099 IP node1.node.53511 > node0.node.domain: 55904+ A? login.node. (29)
17:03:04.875417 IP node0.node.domain > node1.node.53511: 55904* 2/1/1 A 10.255.254.3, A 10.255.254.1 (102)
17:03:04.875432 IP node1.node.53511 > node0.node.domain: 22271+ AAAA? login.node. (29)
17:03:04.875523 IP node0.node.domain > node1.node.53511: 22271* 0/1/0 (79)

=> connection on node2 (10.255.254.3)

TEST 2 (both SSH servers are still up and reachable)

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
(...)

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
17:04:29.663664 IP node1.node.51950 > node0.node.domain: 4685+ A? login.node. (29)
17:04:29.663685 IP node1.node.51950 > node0.node.domain: 36559+ AAAA? login.node. (29)
17:04:29.664046 IP node0.node.domain > node1.node.51950: 4685* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102)
17:04:29.664110 IP node0.node.domain > node1.node.51950: 36559* 0/1/0 (79)

=> connection on node2

(another test confirms the connection to node2 again. It seems that the round-robin is only used for preliminary tests by the ssh client)

TEST 3 (SSH server on node2 is stopped)

[root@node2 ~]# /etc/init.d/sshd stop
Stopping sshd:                                             [  OK  ]

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
17:09:05.854022 IP node1.node.41233 > node0.node.domain: 63435+ A? login.node. (29)
17:09:05.854055 IP node1.node.41233 > node0.node.domain: 3015+ AAAA? login.node. (29)
17:09:05.854436 IP node0.node.domain > node1.node.41233: 63435* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102)
17:09:05.854531 IP node0.node.domain > node1.node.41233: 3015* 0/1/0 (79)
17:09:05.856764 IP node1.node.59579 > node0.node.ssh: Flags [S], seq 3025023931, win 14600, options [mss 1460,sackOK,TS val 9854496 ecr 0,nop,wscale 7], length 0
17:09:05.856806 IP node0.node.ssh > node1.node.59579: Flags [S.], seq 1105519762, ack 3025023932, win 14480, options [mss 1460,sackOK,TS val 350907197 ecr 9854496,nop,wscale 7], length 0
17:09:05.857106 IP node1.node.59579 > node0.node.ssh: Flags [.], ack 1, win 115, options [nop,nop,TS val 9854496 ecr 350907197], length 0
17:09:05.865291 IP node0.node.ssh > node1.node.59579: Flags [P.], seq 1:22, ack 1, win 114, options [nop,nop,TS val 350907205 ecr 9854496], length 21
(...)

=> connection on node0 (failover?? surprise!)

TEST 4 (same conditions)

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0


[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
(...)
17:11:44.154595 IP node1.node.56947 > node0.node.domain: 4602+ A? login.node. (29)
17:11:44.154862 IP node0.node.domain > node1.node.56947: 4602* 2/1/1 A 10.255.254.3, A 10.255.254.1 (102)
(...)

=> same result (connection on node0)

TEST 5 (SSH server on node2 is restarted)

[root@node2 ~]# /etc/init.d/sshd restart
Stopping sshd:                                             [FAILED]
Starting sshd:                                             [  OK  ]

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
(...)
17:17:12.893633 IP node1.node.42432 > node0.node.domain: 7264+ A? login.node. (29)
17:17:12.893988 IP node0.node.domain > node1.node.42432: 7264* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102)
(...)

=> connection on node2 again (failback)

cbesson
  • 33
  • 1
  • 4

2 Answers2

3

DNS does not provide load balancing, so yes unless the host is down it will always utilize a record from the returned list of DNS records. If you want to dynamically handle downed hosts you will have to load balance the incoming connections across your SSH boxes.

Round-Robin DNS requests are very rudimentary in terms of load balancing. Check out the drawbacks section: http://en.wikipedia.org/wiki/Round_robin_DNS

Brent Pabst
  • 6,059
  • 2
  • 23
  • 36
  • 2
    ...Not always the first. The resolver can do pretty much whatever it wants if there are multiple results. – voretaq7 Jun 04 '12 at 17:05
  • @voretaq7 Yes, that's right my mistake. The DNS request just returns a list and it's really up to the client to pick one, however the server doesn't always return the same order with every request either. – Brent Pabst Jun 04 '12 at 17:07
  • Also, unless I am misreading, his tests show that sometimes the returned IP was of node0 (.1) first in the A record list and it still connected to node2 first. – Andy Shinn Jun 04 '12 at 17:18
  • The drawbacks in the article does not convince me for this case. For me, it is possible to do LB with this simple DNS round-robin trick as long as the protocol is not stateful (like HTTP with sessions). I am in an internal infrastructure, and this was known as working into production (my colleagues said...), and I personnally believed this was possible, even if I prefer a true dynamic load balancing solution like LVS with keepalived. What is really surprising here is the behavior of the SSH client itself! But I am maybe the only guy which doesnt notice this before :). Thanks for your feedback. – cbesson Jun 04 '12 at 17:46
  • And yes, no matter the records's order in the response, SSH seems to pick one and _sticks_ always to the same one. Really, I am too lazy to inspect the code to understand why... :D – cbesson Jun 04 '12 at 17:50
  • First HTTP even with sessions is stateless, session objects were created to create a psuedo stateful environment. Second, the behavior you described is correct... it will utilize a DNS record until that host is not available all will try another, if the program is smart enough to do so. In most cases programs will either choose the first record they get back or will utilize a sticky session approach to minimize the amount of hopping around you have to do. You have described the correct behavior of RR-DNS – Brent Pabst Jun 04 '12 at 17:59
  • Ok, I was certainly wrong on the behavior of RRDNS, except if they are wrong these tests prove it. By observing the behavior of several apps, I can also say that the final IP which is used really depends on the program implementation. Thanks again for your feedback! – cbesson Jun 04 '12 at 18:30
  • Np, don't forget to mark the question as answered if it can be closed. – Brent Pabst Jun 04 '12 at 18:31
  • "Fundamentally, HTTP as a protocol is stateless. In general, though, a stateless protocol can be made to act as if it were stateful, assuming you've got help from the client. This happens by arranging for the server to send the state (or some representative) to the client, and for the client to send it back again next time. There are 3 ways this happens in HTTP. One is cookies, in which case the state is sent and returned in HTTP headers. The 2nd is URL rewriting, in which case the state is sent as part of the response and returned as part of the request URI. The 3rd is hidden form fields..." – cbesson Jun 04 '12 at 18:32
  • That means, in case of LB solution, HTTP as to be processed as if it was a stateful protocol, except if the content is only static... – cbesson Jun 04 '12 at 18:37
  • What ssh client are you using? It seems to me that regardless of the order of DNS entries your client receives, it is picking the IP address for which there is a known association in the .ssh/known_hosts file – NcA Jun 04 '12 at 18:47
  • The usual openssh client... from one linux node to another. Thanks to have noticed this about the known_hosts I will take a look tomorrow (it's late here) – cbesson Jun 04 '12 at 18:56
0

Well, finally this behavior works as it is described above only inside the same subnet. When I use openssh clients on another LAN (with an intermediate gateway), that WORKS! I mean: I got a rudimentary load distribution, with a "failover" when one of the nodes is down.

So I conclude that RRDNS is simply enough to handle a basic load distribution of SSH users.

cbesson
  • 33
  • 1
  • 4